Final Report Summary - OLDER LISTNERS (Listening in noise with older native and non-native speakers: The time-line for segregating speech from noise, real-time lexical processing of spoken-words, and the identification of verbal emotions.)
In almost every activity of daily living, older adults need to communicate with members of their family, friends, caregivers and others to maintain their health and optimize their quality of life. Difficulty in hearing speech or following conversations, particularly when other voices or interfering sounds are also present, is one of the most common complaints for older adults, specifically if listeners are operating in their second language. Impairments often generate feelings of isolation, loneliness, powerlessness, and disenfranchisement and result in declines in activities of daily living, mood, and quality of life. Still, the relative roles that different cognitive and sensory changes with age play in these difficulties are debatable. A better understanding of the processes can lead to interventions aiming at more effective communication strategies. This, in turn, will increase older adults' ability to enjoy a broad range of activities and to participate successfully in social situations, improving their quality of life.
To address these issues, the project had three themes: Theme 1, the time-line to create an auditory stream from a noisy background. In other words, the time needed to segregate target speech from noise; Theme 2, the time-line for lexical processing of speech tokens in noise. Simply put, the time needed to differentiate target-speech from its sound sharing competitors in adverse conditions; and Theme 3, identification of the emotional content of speech, based on both the lexical content and prosody. The research on the three themes has provided important data, and led to papers presented in the academic literature, presentation at academic conferences and a variety of outreach activities to disseminate the knowledge to the public.
Theme 1. To comprehend what is being said by an individual, listeners first have to perceptually segregate the target speech from other competing sound sources. If the auditory and/or cognitive systems of older adults are less efficient at accomplishing this task, they will be at a disadvantage vis à vis younger adults in auditory environments with competing sound sources. The studies conducted in this project compare the ability of younger and older adults native and non-native listeners to benefit from a delay between the onset of an auditory masker and the onset of a speech target in a word recognition task for two different types of maskers: steady-state speech spectrum noise and a multi-talker babble, testing both native and non-native speakers. Varying this delay allows us to derive a function relating word accuracy to word-onset delay. This function represents the timeline for stream segregation.
The first part (Ben-David & Schneider, 2012) showed that word recognition improved for young English-as-first-language participants (EFLs) as the delay between masker onset (either a steady-state speech spectrum noise or a babble of voices) and the onset of the word increased, but not when the masker was babble. In the second part (Ben-David Avivi-Reich, & Schneider, 2014, 2016), we used the same task with two groups of 30 younger non-native English speakers, late and early immigrants. Comparing the two studies, suggests that older native listeners are able to use the delay of the target from a noise masker in a similar fashion as native and non-native young adults. However, when the masker was babble, older adults did not gain from even a 1.1 s delay. Finally, the appropriate remedy for all groups (young and older, native and on-native speakers) is to reduce background maskers, and enhance the cues (e.g. clear speech, visible speech cues, context, spatial separation of talkers, that will better differentiate the signal from the background.
Theme 2 – As the speech signal unfolds, several alternatives are activated in response to phonemic information, i.e. CAND leads to candy and candle. In order to successfully achieve word identification, one has to inhibit phonological alternatives, once the contradictory information is accumulated (DY). The project extends on a paradigm developed by Ben-David and colleagues (2011). Older and younger listeners are asked to follow spoken instructions referring to objects depicted on the monitor. For example, “look at the candle.” Eye movements provide a window on the timing and the extent to which listeners momentarily consider competitors that share sounds with the target (candy). As a first step, we translated the paradigm to Hebrew. Next (Hadar Skrzypek, Wingfield, & Ben-David, 2016), load was manipulated using the digit-span task, where participants were asked to retain either one or four spoken digits while identifying the picture that depicts the spoken word. Data showed that the ability to discriminate between the two phonologically competing alternatives was harder in the high-load than in the low-load condition, when words were presented in quiet. These results suggest that working memory plays a role in speech perception, even when performed by young normal hearing adults in ideal listening conditions. The third study (Nitzan, Skrzypek, Wingfield, & Ben-David, in prep.) shows that when words were presented in noise (SNR= -4dB), working memory load had a larger effect on speech perception. Namely, discrimination thresholds between the spoken target word and the phonological alternatives were delayed much further in noise (average of 500 miliseconds).
Theme 3. The ability to correctly identify emotions in speech is at the core of human communication. To identify an emotion, one should be able to process and identify the semantics (lexical meaning) and the prosody (tone of speech) of the utterance, and integrate them. Deciphering this complex interplay of prosody and semantics may become even more challenging in older age. Age-related changes in auditory-sensory factors and cognitive processing may hinder correct identification of emotions in spoken language.
As a first step we validated a novel tool, T-RES, Test for Rating of Emotions in Speech, designed to assess the complex interaction of prosody and semantics in spoken emotions. Listeners are presented with spoken sentences in which the emotional valence of prosody and semantics appear in different combinations from trial to trial, with four separate emotions (anger, fear, happiness and sadness) and a neutral emotion, serving as a baseline for performance. They are asked to rate each sentence on four rating scales, related to the four tested emotions. With data collected from 80 younger adults with the English stimuli (Ben-David, Multani, Shakuf, Rudzicz, & Van Lieshout, 2016), we found that prosody and speech cannot be selectively perceived one without the other, and that for younger adults the prosodic information is more dominant. In the second step the tool was translated to Hebrew with several similarities in performance (Shakuf et al. in prep.).
The third study (Shakuf, Gal-Resenbaum, & Ben-David, 2016) compared performance with forty older (age: 65-75) and 40 young (age: 20-30) adults. Results reveal significant age-related differences. For younger adults, emotional ratings appear to be impacted mainly by the prosodic dimension, with only a small contribution of the semantics. Whereas for older adults, both dimensions contribute to the emotional ratings equally. Simply put, older adults weigh in both the how and the what in perceiving emotions in speech.
To address these issues, the project had three themes: Theme 1, the time-line to create an auditory stream from a noisy background. In other words, the time needed to segregate target speech from noise; Theme 2, the time-line for lexical processing of speech tokens in noise. Simply put, the time needed to differentiate target-speech from its sound sharing competitors in adverse conditions; and Theme 3, identification of the emotional content of speech, based on both the lexical content and prosody. The research on the three themes has provided important data, and led to papers presented in the academic literature, presentation at academic conferences and a variety of outreach activities to disseminate the knowledge to the public.
Theme 1. To comprehend what is being said by an individual, listeners first have to perceptually segregate the target speech from other competing sound sources. If the auditory and/or cognitive systems of older adults are less efficient at accomplishing this task, they will be at a disadvantage vis à vis younger adults in auditory environments with competing sound sources. The studies conducted in this project compare the ability of younger and older adults native and non-native listeners to benefit from a delay between the onset of an auditory masker and the onset of a speech target in a word recognition task for two different types of maskers: steady-state speech spectrum noise and a multi-talker babble, testing both native and non-native speakers. Varying this delay allows us to derive a function relating word accuracy to word-onset delay. This function represents the timeline for stream segregation.
The first part (Ben-David & Schneider, 2012) showed that word recognition improved for young English-as-first-language participants (EFLs) as the delay between masker onset (either a steady-state speech spectrum noise or a babble of voices) and the onset of the word increased, but not when the masker was babble. In the second part (Ben-David Avivi-Reich, & Schneider, 2014, 2016), we used the same task with two groups of 30 younger non-native English speakers, late and early immigrants. Comparing the two studies, suggests that older native listeners are able to use the delay of the target from a noise masker in a similar fashion as native and non-native young adults. However, when the masker was babble, older adults did not gain from even a 1.1 s delay. Finally, the appropriate remedy for all groups (young and older, native and on-native speakers) is to reduce background maskers, and enhance the cues (e.g. clear speech, visible speech cues, context, spatial separation of talkers, that will better differentiate the signal from the background.
Theme 2 – As the speech signal unfolds, several alternatives are activated in response to phonemic information, i.e. CAND leads to candy and candle. In order to successfully achieve word identification, one has to inhibit phonological alternatives, once the contradictory information is accumulated (DY). The project extends on a paradigm developed by Ben-David and colleagues (2011). Older and younger listeners are asked to follow spoken instructions referring to objects depicted on the monitor. For example, “look at the candle.” Eye movements provide a window on the timing and the extent to which listeners momentarily consider competitors that share sounds with the target (candy). As a first step, we translated the paradigm to Hebrew. Next (Hadar Skrzypek, Wingfield, & Ben-David, 2016), load was manipulated using the digit-span task, where participants were asked to retain either one or four spoken digits while identifying the picture that depicts the spoken word. Data showed that the ability to discriminate between the two phonologically competing alternatives was harder in the high-load than in the low-load condition, when words were presented in quiet. These results suggest that working memory plays a role in speech perception, even when performed by young normal hearing adults in ideal listening conditions. The third study (Nitzan, Skrzypek, Wingfield, & Ben-David, in prep.) shows that when words were presented in noise (SNR= -4dB), working memory load had a larger effect on speech perception. Namely, discrimination thresholds between the spoken target word and the phonological alternatives were delayed much further in noise (average of 500 miliseconds).
Theme 3. The ability to correctly identify emotions in speech is at the core of human communication. To identify an emotion, one should be able to process and identify the semantics (lexical meaning) and the prosody (tone of speech) of the utterance, and integrate them. Deciphering this complex interplay of prosody and semantics may become even more challenging in older age. Age-related changes in auditory-sensory factors and cognitive processing may hinder correct identification of emotions in spoken language.
As a first step we validated a novel tool, T-RES, Test for Rating of Emotions in Speech, designed to assess the complex interaction of prosody and semantics in spoken emotions. Listeners are presented with spoken sentences in which the emotional valence of prosody and semantics appear in different combinations from trial to trial, with four separate emotions (anger, fear, happiness and sadness) and a neutral emotion, serving as a baseline for performance. They are asked to rate each sentence on four rating scales, related to the four tested emotions. With data collected from 80 younger adults with the English stimuli (Ben-David, Multani, Shakuf, Rudzicz, & Van Lieshout, 2016), we found that prosody and speech cannot be selectively perceived one without the other, and that for younger adults the prosodic information is more dominant. In the second step the tool was translated to Hebrew with several similarities in performance (Shakuf et al. in prep.).
The third study (Shakuf, Gal-Resenbaum, & Ben-David, 2016) compared performance with forty older (age: 65-75) and 40 young (age: 20-30) adults. Results reveal significant age-related differences. For younger adults, emotional ratings appear to be impacted mainly by the prosodic dimension, with only a small contribution of the semantics. Whereas for older adults, both dimensions contribute to the emotional ratings equally. Simply put, older adults weigh in both the how and the what in perceiving emotions in speech.