Skip to main content
European Commission logo print header

The challenge of reduced pronunciation variants in conversational speech for foreign language listeners: experimental research and computational modeling

Final Report Summary - FOREIGNCASUALSPEECH (The challenge of reduced pronunciation variants in conversational speech for foreign language listeners: experimental research and computational modeling)

Words are often produced with fewer and weaker segments in everyday conversations than in formal speech situations. For instance, the American English word ‘yesterday’ may be pronounced like /jɛʃej/ and ‘salami’/ like /slɑ:mi/ in casual speech. We documented that learners of a foreign language have great difficulties understanding these reduced word pronunciation variants. For instance, they judge over 50% of words showing simple vowel deletion as pseudowords, instead of as real words, and they make many errors for reduced words in transcription tasks. Learners typically do not consider the possibility that words are reduced; they interpret reduced words as words with the number of segments that were realized (i.e. if a word of 7 segments is reduced to a 5 segment word; learners report to have heard a 5 segment word, unlike natives, who reconstruct missing segments).
In this project, we investigated how learners process reduced words, comparing them with native listeners. We focused on Spanish, Dutch, and Mandarin learners of English and on Dutch learners of French, comparing them with native speakers of English and of French. Some of the comprehension experiments that we conducted use new (combinations of) experimental techniques or we used new statistical techniques to analyse the data.
We demonstrated that, when trying to understand reduced word pronunciation variants, learners are less well able than native listeners to take the meaning and the structure of the sentence into account. They often interpret reduced words as words that do not match the context at all. Reduction is therefore really detrimental to these listeners’ comprehension, blocking sentence comprehension.
Learners are also less well able than native listeners to interpret the details of the speech signal, such as the exact duration or quality of a segment. Native listeners heavily rely on this type of acoustic information in order to comprehend reduced word pronunciation variants. For instance, whereas native listeners of English can also distinguish between ‘can’ and ‘can’t’ if the /t/ of ‘can’t’ is reduced, based on the duration and quality of the vowel, learners of English with Spanish or Mandarin as native languages have problems distinguishing these two words. Similarly, Spanish and Dutch learners have more problems than native listeners to distinguish between ‘sport’ and ‘support’ when the schwa of ‘support’ is short. Learners have less difficulty interpreting a given type of acoustic detail if that detail also plays a role in their native languages.
Finally, learners rely on the frequencies of occurrence of word pronunciation variants in their linguistic input. That is, they recognize the reduced variant of a word more quickly if this variant is more frequent. They behave like native listeners in this respect, but, in the end, this is a disadvantage because learners’ (classroom) input differs from native listeners’ input and learners are therefore not well tuned to everyday speech in their foreign language.
In order to be able to conduct this research, we needed to know the reduction patterns of the languages that the learners acquired and of their native languages. We therefore also conducted several studies of casual French and casual Dutch, documenting in detail some highly frequent reduction patterns.
Finally, we developed a theory of word comprehension that was computationally implemented. This model, called Diana, takes as its input the speech signal. Diana is the first model that can well simulate how quickly a native listener of Dutch or English recognizes a word or can classify a word as a real word or a pseudoword. The model assumes that every word is represented in the listener’s mental lexicon with more acoustic detail than many other models do. This feature of the model is based on findings from several experiments that we run in this project.