Minding your Ps and Qs: How linguistic environment influences literacy

Through computational tools and experiments, the PhonPred project shed light on how we extract knowledge from our linguistic environment.

Society

Languages are characterised by a variety of regular patterns that govern how words are formed and how they combine to express thoughts. Phonotactic regularities are constraints on how sounds are positioned within words and how these combine. For example in English, the sounds for ‘k’ and ‘n’ cannot occur together – think of the pronunciation of ‘knife’ – whereas they can in German. Phonotactic regularities are complex. Many have exceptions, and within their broader language context there are constraints on how sounds may be spelled and on how this may be influenced by meaning. Most language regularities are not taught in schools: research suggests we possess implicit knowledge of them, garnered by experience. By identifying what language regularities exist, and what speakers know about these, the Marie Skłodowska-Curie Actions(opens in new window) supported PhonPred project set out to answer questions about human learning and cognition. “Linguistic behaviour is determined by the language environment that humans are exposed to. Without understanding this environment, we can’t fully understand human behaviour,” explains Kathy Rastle, project supervisor from Royal Holloway, University of London(opens in new window), the project host. A key finding was that people were consistently good at assimilating important regularities from their language environment. These span different levels of linguistic representation, for example, linking sound, spelling and meaning. While exceptions make it challenging to detect and learn these regularities, this research demonstrates that people do succeed. English-speaking adults, for instance, were able to classify a non-word like ‘domous’ as an adjective, because they’ve learned that the suffix ‘ous’ typically denotes it as such. The project also found differences in how individuals assimilate language intricacies, which is often attributable to the amount and quality of language experience. “This means that manipulating the exposure that people experience may shape their language knowledge and so help facilitate reading and writing skills,” adds Ana Ulicheva, the MSCA fellow.

The science behind learning

PhonPred undertook two broad, but interlinked, approaches. Firstly, computational analysis of language structure helped uncover existing regularities in the linguistic system. Analysis was based on large collections of texts called ‘corpora’, supplemented with information about the sound structure of words and their meanings. Regularities of interest were then identified and characterised using computational tools, such as distributional semantics(opens in new window). Secondly, experiments were conducted to investigate the use of regularities. Adult native speakers of English completed a series of tasks involving reading aloud, spelling and making decisions about real and nonsense words. Sample sizes varied depending on the task and statistical considerations, with some experiments using online crowdsourcing.

Improving teaching methods

Despite the importance of language and literacy skills for employment prospects, research suggests that about 20 % of students in OECD countries, on average, do not attain the baseline level of proficiency in reading(opens in new window). By better understanding how we learn, and how knowledge differs among individuals and across languages, PhonPred’s results can contribute to the development of improved teaching methods. “Language regularities are more probabilistic than deterministic. This has implications for the measurement and interpretation of language knowledge and behaviour,” says Ulicheva. “Our work suggests that some variation amongst individuals seen in schools or clinics stems from the linguistic environment, and not inherent differences among individuals.”