Skip to main content

Prediction in Speech Perception and Spoken Word Recognition

Final Report Summary - PSPSWR (Prediction in Speech Perception and Spoken Word Recognition)

The mechanisms human listeners employ to extract meaningful linguistic content from the continuous, physical speech signal remain poorly understood despite decades of investigation. Recent models have emphasized the importance of predictive mechanisms during perception. The idea is that listeners generate expectations of what they will hear next and these expectations are predicated upon knowledge of their specific native language phonology. In such models exist four components: a) continuous uptake of incoming speech signal, b) predictive mechanisms and the generation of hypotheses/expectations, c) matching of hypothesis with signal and d) error signal generation when predictions mismatch. The goal of the project was to further understand these mechanisms and their perceptual and neurophysiological correlates, accomplished by combining insights from linguistics, cognitive neuroscience and psychology. The linguistic phenomenon of sibilant harmony in Basque is the test case. Basque is a linguistic isolate, whose largest concentration of speakers resides in Northeastern Spain. All experiments were conducted at the Basque Center on Cognition, Brain and Language (BCBL), a state-of-the-art research facility that focuses on the perceptual and brain bases of language. The BCBL is located in the heart of the Basque Country in Donostia-San Sebastián, Spain, providing the ideal geographic location and optimal technical resources and staff to carry out the line of investigation detailed in the proposal. The coordinator has extensive training in formal linguistic theory, cognitive neuroscience and experimental psychology, providing the interdisciplinary foundation necessary for carrying out the research outlined in the current project.
First, the project aims to contribute to our understanding of the perceptual and neurophysiological mechanisms that underlie our ability to comprehend spoken language so effortlessly by integrating data from a less-well studied language. The second set of objectives is to pursue an integrated approach to spoken word comprehension by establishing collaborations both within Spain and the European Union, more generally. To that end, the coordinator has established national and international collaborations with established researchers and has continued and reinforced existing collaborations within the larger the larger European community.
To advance our understanding of the perceptual mechanisms employed during online speech comprehension, auditory experiments are conducted that entail overt responses from participants or measurements of their ongoing electrical brain activity. Due to constraints, only Basque pseudowords were used. This allowed for better control over stimulus characteristics. Consequently, ~400 Basque pseudowords were created, recorded and acoustically measured. Subsequently, a series of phoneme monitoring and EEG experiments were conducted. The experiments were conducted under the guidelines of the Institutional Review Board at the BCBL. The data was analyzed in accordance with standard practices within cognitive psychology and cognitive neuroscience.
In a series of behavioral phoneme-monitoring tasks, participants listened to Basque pseudowords and responded when they heard the sound for which they were monitoring (three conditions: match (same place of articulation (PoA), e.g. usatsu), mismatch (different PoA, e.g. uzatsu) and control (non-sibilant /f/, e.g. ufatsu)). We found that mismatch items showed longer reaction times than control and match items across two distinct experiments. Furthermore, to determine the time-course of phonological predictions, each condition also had two different lengths of stimuli: in the Short condition, sibilants were the onset of adjacent syllables, and in the Long condition, there was an intervening syllable. Mismatch items showed reliably longer reaction times compared to Control and Match items (see Figure 1). There was no main effect of length, suggesting that the strength of these predictions does not dissipate at longer phonological distances. Subsequently, more complex analyses were conducted. In particular, linear-mixed effects models (LMEs) were applied. These results clearly demonstrate the listeners construct predictions of what they will hear next and that these predictions reflected abstract properties of their native language phonology. This is only the second psycholinguistic test (to the best of our knowledge) using long-distance phonological processes, and the first to use long-distance phonological properties that operate over consonants.
Behavioral measures, while immeasurably valuable to constructing models of spoken language processing, only provide an endpoint measure of the underlying processes. If we want to measure pre-decision responses, we need a continuous measure of behavior. Consequently, to determine the temporal dynamics of phonological prediction, subsequent EEG experiments were conducted. Participants passively listened to Basque pseudowords and responded when they heard a distractor sinusoid. A reliable increase in positivity over central-parietal electrodes was observed in the Mismatch condition compared to the Match and Control conditions. This positivity was observed beginning 75 ms post-onset of the second sibilant indicating that these phonological predictions are exerted early in the electrophysiological response (see Figure 2). Additionally, there has been recent interest in localizing the underlying neurophysiological mechanisms of prediction in low-frequencies. This motivated the complex reanalysis of the electrophysiological data in the time-frequency domain. The analysis is still in progress, but to date, the Control condition (the condition in which no prediction along this dimension is possible) shows a relative increase in oscillatory power in the low-frequency portion of the spectrum (see Figure 3). Importantly, this difference is seen prior to the onset of the second sibilant, suggestive of differential activity prior to second sibilant onset between the conditions in which a prediction can be made (i.e. Match and Mismatch conditions) and one wherein a prediction cannot be made (i.e. Control condition). Further work, however, is necessary to validate these results. It is believed that if these findings hold, they have the potential to provide a whole new way of understanding these predictive mechanisms and begin to build neurophysiologically plausible models of these mechanisms. In a similar line of research outside the direct purview of the project in collaboration with Andrea E. Martin (University of Edinburgh UK) and Arthur G. Samuel (Stony Brook University USA/BCBL SPAIN), we have investigated the role of morphosyntactic grammatical gender agreement in Spanish on the processing of vowels. In a series of vowel identification studies, we find that identification of vowels is strongly dependent upon the agreement relationship between a noun and adjective. We interpret these results inline with the predictions of the current project. Namely, listeners use various aspects of their linguistic knowledge to generate expectations of what they will hear next, and these expectations feedback and shape lower-level perceptual processes.
The final results provide the foundation for future investigations and more specific tests of models that incorporate mechanisms, such as prediction, feedback and hypothesis generation. To date, little is understood about these predictive mechanisms during online speech perception, while we continue to learn more about such mechanisms more generally in related fields. Moreover, extending previous work in related disciplines regarding prediction into the current domain has the potential to immeasurably impact how we reconcile the research questions in speech to related fields. This has the long term impact of informing our understanding of various speech related deficits. There is no website, URL or established public promotion for the project.