Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

The reading brain as a statistical learning machine

Periodic Reporting for period 4 - STATLEARN (The reading brain as a statistical learning machine)

Reporting period: 2021-03-01 to 2022-02-28

Despite written language is not part of our genetic endowment, literate adults process an impressive amount of information as they read, and do that flawlessly and nearly error–free. How this happens is largely unknown, and represents a fundamental issue for theories of human learning. Building on data from nonhuman primates, human infants and psycholinguistic experiments on word internal structure, STATLEARN tested the hypothesis that statistical learning is one of the fundamental cognitive mechanisms underlying visual word identification and reading. Human infants learn to chunk smaller perceptual units (e.g. oriented lines) into larger, meaningful objects (e.g. tools, faces), taking advantage of recurrent patterns in their distribution. As developing readers, they would apply this very same mechanisms to a newly–encountered type of visual objects, i.e. letters. On this basis, they would build progressively higher–order orthographic units, which eventually make their visual word identification as adult readers astonishingly efficient.
STATLEARN tested this conjecture combining techniques from Computational Linguistics, Experimental Psychology and Neuroscience. We carried out experiments with adults and children, using both behavioural methods and state–of–the–art technologies such as EEG, eye tracking and MEG. We also tested natural reading, as well as simulated learning of new writing systems in the lab. Overall, this rich set of experiments showed that reading and word learning do build on sensitivity to letter statistics, and do so from a very young age. This seems to be based on a mechanism that is not specific for language, and that is at play in non-linguistic animals as well, at least in a rudimentary form.
Given that reading is one of the most widespread human activities and is critical to navigate the modern society, this project promises to have far–reaching impact. By providing new insight on how we acquire literacy, STATLEARN may inform how we diagnose and treat developmental and acquired dyslexia, and how written language is taught in schools. More generally, the project may shed new light into the incredible learning and information processing abilities of the human brain.
As outlined in details in Section 1.1 we carried out behavioural, EEG, MEG and eye tracking studies; with adult skilled readers, children on their way to mastering literacy and rats; using both real linguistic materials and artificial lexicons, pseudocharacters and synthetic visual stimuli. More specifically, we worked on the following four fronts.
Words that share part of their internal structure (e.g. [mind]ful and [mind]less) are connected in the human brain and cognitive system. We have carried out a series of chronometric experiments to test whether this is related to letter co–occurrence statistics—we would notice the presence of “mind” in “mindful” and “mindless” because the letters m, i, n and d occur together often in the language. The data we gathered so far suggest this not to be the case.
In a second series of experiments, we asked our participants to learn a bunch of novel words in the lab, and tested whether they relied on letter co–occurrence statistics in doing so. When the experiments involved familiar letters, participants tended to apply the statistics of their native language, rather than learning new regularities based on the novel words. When we used an unfamiliar alphabet instead, participants did seem to capture the co–occurrence pattern between the novel characters.
We also looked for brain signatures of sensitivity to recurring chunks of letters. We did so by presenting these chunks periodically into a stream of visual events, and assessing whether the brain synchronises its rhythm to this same periodicity. The data suggest this to be case, at least in areas typically deputed to higher–level vision (i.e. the left occipito–temporal cortex). We also obtained evidence that this sensitivity is enhanced by meaning—the brain responds more to recurring clusters of letters that also carry a consistent meaning (e.g. “ness” in “kindness”, “fairness” and “bitterness”, or “er” in “driver”, “dealer”, or “baker”).
We complemented this evidence on adults by looking into when sensitivity to letter statistics emerges in children learning to read. Eye tracking data on text reading suggest that children are already sensitive to the frequency with which given letter combinations occur in the language in Grade 3, and that this information guides their visual exploration of the written text. We also investigated whether this sensitivity shows up in brain signatures, and particularly in the capacity of the brain to entrain with external stimuli.

This set of data has generated a large number of conference presentations and papers, which are reported in the Publication and Dissemination sections. Moreover, several papers are in the making, as detailed below:
1. Lelonkiewicz et al., Morphemes as letter chunks: Linguistic information enhances the learning of visual regularities. [submitted]
2. Pescuma et al., Automatic Morpheme Identification Across Development: Magnetoencephalography (MEG) Evidence from Fast Periodic Visual Stimulation. [submitted]
3. Hasenacker et al., Prediction at the intersection of sentence context and word form: Evidence from eye-movements and self-paced reading. [submitted]
4. De Rosa et al., Co-occurrence statistics affect letter processing. [in preparation]
5. De Rosa et al., Selective Neural Entrainment Reveals Hierarchical Tuning to Linguistic Regularities. [in preparation]
6. Pescuma et al., Eye movements during natural reading reveal sensitivity to orthographic regularities in children. [in preparation]
7. Pescuma et al., EyeReadIt: A developmental eye-tracking corpus of text reading in Italian
8. Lelonkiewicz et al., Spontaneous human-like string processing in rats. [in preparation]
9. Lelonkiewicz et al., Lexical diversity and word learning based on letter statistics. [in preparation]
10. Ktori et al., Affix-like chunks determine lexical plausibility in children. [in preparation]
11. Ktori et al., Morpheme position coding in compounds. [in preparation]
12. Franzon and Crepaldi, Feature coding in morphological contrasts. [in preparation]

In addition, two conference presentations are also underway:
1. European Society for Cognitive Psychology (ESCoP), August 2022 (by Maria Ktori)
2. Interdiscplinary Advanced in Statistical Learning, June 2022 (by Davide Crepaldi)
The data collected in the second part of the project, and more generally during the course of STATLEARN, clearly indicate that readers are sensitive to the statistical structure generated by how letters co–occur in the written language, and to how these co–occurrences inform about word meaning. This calls for a paradigm shift in reading research: the brain and cognitive processes behind reading aren’t only related to the fact that reading is written language, but also to the fact that reading generates a novel and somewhat independent visual domain, which the human brain/mind captures through general–purpose learning algorithms, based (at least in part) on probabilistic associations between (chunks of) letters. In line with this novel theoretical perspective, sensitivity to the statistical information brought about by written language emerges early in children learning to read and, at least in a rudimentary form, in non-linguistic animals.
Brains@work_ERC STATLEARN
My booklet 0 0