Community Research and Development Information Service - CORDIS

Periodic Report Summary 1 - SPEECH IN CONTEXT (The neural implementation of contextual influences in speech perception)

Speech sounds are highly variable, both when spoken by the same speaker on multiple occasion, and for speech sounds spoken by different speakers. In addition, speech sounds can differ as a result of different listening conditions. In everyday situations, however, listeners are remarkably resilient against such variation. The project “Speech in Context” was developed to investigate the neural implementation of a number of processes that help listeners to accommodate variation in perceptual input. To this end, during the last two years the researcher has been seconded to the University of California at Berkeley to work in the labs of Professor Keith Johnson and Dr. Edward Chang. In the lab of Dr Edward Chang (a neurosurgeon), and in collaboration with Professor Keith Johnson, the researcher gained experience with the method of Electrocorticography (ECoG). ECoG involves an invasive procedure where epilepsy patients are implanted (subdurally) with electrode grids for medical purposes (localization of epilepsy foci). During their hospitalization these patients are asked whether they are willing to participate in fundamental research. Subdural recordings allow for spatially and temporally very precise recordings of cortical activity, which is especially important for research into speech perception, which inherently involves very fast and spatially focal processing. The main methodology of the project was ECoG, allowing for the investigation of various mechanisms through which contextual information influences the cortical processing of speech sounds. The researcher has mainly worked on 6 experimental projects and the writing of one book chapter. All are listed below.

Work performed
With regard to the first project, called “speaker normalization”; as mentioned above, different speakers produce the the same speech sound (e.g., the sound /o/) differently. Listeners, however, manage to adjust their perception to accommodate such between-speaker differences. For this project, the researcher created a synthesized /o-/u/ continuum (i.e., with “ambiguous” sounds that sound a little like /u/ and /o/ at the same time). The distinction between /o/ and /u/ is mainly made based on the “first formant”, which reflects an important resonance frequency of the human vocal tract. These sounds were presented after sentences spoken by a speaker with a long vocal tract (and hence a low first formant), and sentences from a speaker with a short vocal tract (and hence a high first formant frequency). Behavioral results showed that listeners perceptually adjust for vocal tract differences in the sentences preceding the target sounds. That is, after hearing a sentence from a speaker with a high first formant, the subsequent target sound was perceived more often as /u/ (that is, the option with a low first formant frequency). The researcher has developed the speech materials, pretested the material on healthy participants, and has collected data with these materials from 8 ECoG patients.
The second project called “compensation for room acoustics” was developed to investigate how speech spoken in different rooms affects listener's perception of speech sounds. This project relies on the same approach and materials as the second project (speaker normalization) except that now context sentences are manipulated to sound as if they were spoken in different rooms (instead of spoken by different speakers). The researcher has developed the speech materials, pretested the material on healthy participants, and has collected data with these materials from 4 ECoG patients.
A third project, called “Rate normalization”, focussed on listener's ability to perceptually adjust for differences in speech rate. In English, the duration of the vowel is the main cue to distinguish between the spoken words “heard” (long vowel) and “hurt” (short vowel). Interestingly, when listeners hear a sound with a duration that is ambiguous between the two options, the speech rate in a contextual sentence can influence how a word is perceived. That is, in a context that is quickly spoken, a vowel with an ambiguous duration will be perceived as relatively long, and hence lead to the perception of the word “heard”. Similarly, a slowly spoken sentence will cause listeners to perceive an ambiguous sound as relatively fast, and will lead to increased proportion of “hurt” responses. For this project, the researcher has developed and pretested materials on healthy participants, and has collected data on the task for 1 patient. More ECoG data will be collected in the months to come.
A fourth project, called “laterality of normalization” involves an investigation of lateralization of context effects. The researcher has used the materials that were developed for the second and third projects (speaker and rate normalization), and used them in a context to investigate different contributions of the two hemispheres in normalization processes. When sounds are presented to only the left or the right ear, initial processing of those sounds is more dominant in the contralateral hemisphere. Previous research has suggested that context effects may be more dominant in the right hemisphere. To investigate influences of lateralization of contextual influences, the researcher has presented context and target sentences to the left ear, the right ear or both, in a group of 25 healthy participants. The researcher will analyse these data in the next couple of months.
The fifth project, called “phoneme restoration” involves an investigation into listeners ability to perceptually restore missing information. When listening to speech, particular speech sounds are sometimes physically inaudible because they are occluded by background noise (clattering dishes, a car horn, or a cough). Somehow, listeners are often unaware of such instances. In the first project the researcher collaborated with Dr. Matthew Leonard and Dr Edward Chang to investigate such effects. The researcher carried out an important part of the analyses of neural patterns of activity, and has played an important role in writing the manuscript.
The sixth project focussed on the processing of “speech in noise”. Previous research has suggested that listeners may invoke their production system to resolve perceptual ambiguities, by means of a form of analysis by resynthesis (known as the “motor theory of speech perception”). It has been suggested that motor-related regions become especially involved under challenging listening conditions. In this project the researcher presented 5 ECoG patients with speech sounds (individual syllables like “ba”, “ka”, “sha” etc.) in various levels of background noise, and asked participants to indicate which speech sound they heard. The researcher has analysed the behavioral and ECoG data. A manuscript is currently written.
In addition to these experimental projects the researcher has written a book chapter “The cortical processing of speech sounds in the temporal lobe”, which has now been submitted for review.

Main results, and expected impact
For the first and second projects, “speaker normalization” and “compensation for room acoustics”, the researcher is currently analysing the ECoG data. Initial analyses have revealed that effects of normalization (that is, the perceptual warping of speech sound representations to a speaker’s expected formant range and or different room acoustics) occurs at very early cortical levels. That is, already at the level of the STG, vowel representations are modified to adjust for expected speaker and room differences. In the months to come the researcher will finalize these analyses. The work for the two projects is aimed to be submitted in a single overarching publication, aimed for submission to a high-ranking journal around January.
For the third project, the behavioral results indicate that listeners indeed compensate for contextual speaking rate. The researcher will analyse the data for the single ECoG participants in the weeks to come. Analyses of the behavioral data of this patient reveal strong and reliable behavioral normalization effects as expected. Further ECoG data will be collected over the next months. The work will be aimed for submission to a high-ranking journal around May 2017.
For the fourth project, the data from 25 participants (non-patient) has been collected and will be analysed in the months to come. The work is aimed for submission to a focussed language journal such as “Brain and Language”.
For the fifth project, “phoneme restoration”. Using ECoG, it was observed that listeners use preceding acoustic information to neurally “fill-in” missing information. That is, when a listener says they heard, e.g. an /s/, where in fact they heard some occluding, other sound, early auditory regions of the cortex (STG; Superior Temporal Gyrus) activity redisplays the same processing as when participants had actually heard a real /s/. These data have been be presented at two international meetings on neuroscience (SNL & SfN 2015). This work has now been accepted for publication in “Nature Communications”. The researcher will feature as shared second/third author on this project. The work has implications for our understanding speech perception in general but also for pathologies associated with auditory hallucinations such as schizophrenia.
For the sixth project, the researcher has reached the final analysis stage. The results show that motor-related regions are indeed activated upon hearing speech sounds in the task as described. However, although auditory regions such as STG allow for classification of the different speech sounds, representations on the motor-related regions are not speech sound specific. That is, they do not represent the different speech sounds differently. These findings suggest that the involvement of motor regions in speech perception are not a result of the type of analysis by resynthesis as suggested in the motor theory of speech perception. The researcher has presented the work as a talk at the international meeting of the Society for the Neurobiology of Language. The work will be submitted to a high ranking journal around November 2016.

Reported by



Life Sciences
Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top