Final Report Summary - EMOTION IN SPEECH (Emotional prosody in speech: the importance of pitch, timing and loudness)
International Incoming Fellowships (IIF)
FP7-PEOPLE-2011-IIF
Grant agreement number 302452
“Emotional prosody in Speech: The importance of pitch, timing, and loudness”
This work was carried out in the Laboratoire Psychologie de la Perception in Paris, France (http://lpp.psycho.univ-paris5.fr). The main goals of this project were to better understand perception of emotional prosody on its own and to put it in a more universal context by comparing it with emotional expression in music. Thus, the project comprised three major sections: First, we developed a validated set of emotional utterances in French. Second, we tested the effect of second language (English) knowledge in French speakers on perception of American prosody. Third, we created a set of musical stimuli expressing anger and joy and tested these along with equivalent speech stimuli in an EEG study.
Validated emotional utterances
The goal of this section of the project was to establish a set of stimuli that can be used by French researchers to study emotional prosody. Six actors (four female, two male) were asked to record two sentences with eight different emotions (anger, disgust, joy, sadness, interest, pride, relief, and fear) and without emotion (neutral). The two sentences were "L'avion est presque plein" (The airplane is almost full) and "J'espère qu'il va m'appeler bientôt" (I hope he will call me soon). The recordings were done in a soundproof booth using Cool Edit Pro.
Thirty-one (23 F, 8 M) monolingual French-speaking participants (age range 18-28, M = 22) were presented with 388 utterances and asked to select which of the eight emotions was being conveyed by the utterance.
The most consistently correctly classified utterance was selected for each sentence/speaker/emotion combination. For one actress, many utterances were recognized either below chance or were relatively not well recognized compared to the other actors – therefore, only the utterances from three actresses and the two actors were selected. This resulted in 5 actors x 2 sentences x 9 emotions = 90 utterances. Even within this small selection, acoustic measurements showed that the way emotions were expressed by the different actors varied widely.
Currently, these utterances are being further validated in a laboratory in Lyon (Auditory Cognition and Psychoacoustics) with the aim of testing emotion perception in speech among amusic participants. These participants have difficulties processing pitch information in music; hence, we hypothesize that they will also have difficulty processing emotion in speech prosody, which is heavily reliant on pitch information.
This project was an essential foundational step to research such as this amusia project because a validated set of utterances expressing different emotions such as this previously only existed in other languages and in pseudo-French.
Second language knowledge on perception of emotion
The goal of this section of the project was twofold: first and foremost it was the first time the impact of second language ability on emotion recognition was evaluated within a single population, and second, it connected the project with ongoing cross-linguistic research in the laboratory. We tested French speakers with varying levels of knowledge of English on their recognition of emotions in stimuli from the corpus “Vocal Expressions of Nineteen Emotions across Cultures”: American English emotional prosody, American nonverbal vocalizations (here called “affect bursts”), and pseudo-French (from the Geneva Multimodal Emotion Portrayals). We examined the relation between their self-reported English ability and their recognition and ratings of emotions in English prosody and affect bursts. We found that English ability was only related to emotional prosody, and only for positive emotions: lower recognition accuracy was correlated with better English ability. We attribute this surprising finding to an interference between the semantic content of the utterance and the emotion, which may be higher for participants with better English ability. Only positive emotions were affected because they have been previously shown to be less universal than negative emotions, that is, more susceptible to cross-cultural differences. Therefore, recognition of positive emotions may be in general less stable. A paper on this section of the project is currently in press at PLOS One.
Emotion in music and speech: An EEG study
This section examined automatic recognition of speech prosody using an implicit measure and compared music and speech to each other in order to explore universal aspects of auditory emotion. Two professional improvisational musicians (a clarinetist and a flutist) performed two tasks: first, they imitated the pitch and rhythm of angry and happy speech utterances selected from the recordings described in the first section of this report, and second, they improvised their own happy and angry speech melodies. A pilot test showed that the imitations of speech were not able to communicate emotion effectively, but the improvisations were, perhaps because a musical instrument cannot reliably imitate all of the aspects of speech that allow it to communicate emotion.
These improvised melodies and the speech stimuli were then used in an EEG study to ask the question of whether the brain processes emotion from these two types of stimuli in similar ways. Participants heard happy or angry speech or music, and 500 ms later saw a happy or angry face. Brain responses from congruent (i.e. matching emotion, both happy or both angry) and incongruent pairs were compared. The response to the incongruent trials was more negative around 400-600 ms (i.e. an N400 effect) for the happy speech (see Fig. A, showing the response from a fronto-central electrode) but not for the angry speech trials (Fig. B). There was no congruency effect for music, though the responses between 400-600 ms were overall more negative in fronto-central regions for happy than for angry music, showing that participants differentiated between the two emotions.
The lack of effect for the angry speech could be because the angry speech itself affected the participants’ arousal levels, masking any congruency response. The lack of effect for the music is more difficult to pin down, given that the participants did differentiate between the two emotions of music. Perhaps the emotion was not expressed as clearly as in the speech, so the (in)congruency was never salient enough to result in brain activity differences. Currently, a paper on this section of the project is in preparation.
Overall, this project highlighted the complexity of emotional prosody perception. In the first section, we found large acoustic variability even within a small selection of accurately-identified utterances. In the second section, we found that contrary to intuition, increased second language (L2) ability hindered rather than helped with emotion identification in that L2. And in the third, we showed that an N400 congruency response can vary according to the base emotions being conveyed.
Validated emotional utterances
The goal of this section of the project was to establish a set of stimuli that can be used by French researchers to study emotional prosody. Six actors (four female, two male) were asked to record two sentences with eight different emotions (anger, disgust, joy, sadness, interest, pride, relief, and fear) and without emotion (neutral). The two sentences were "L'avion est presque plein" (The airplane is almost full) and "J'espère qu'il va m'appeler bientôt" (I hope he will call me soon). The recordings were done in a soundproof booth using Cool Edit Pro.
Thirty-one (23 F, 8 M) monolingual French-speaking participants (age range 18-28, M = 22) were presented with 388 utterances and asked to select which of the eight emotions was being conveyed by the utterance.
The most consistently correctly classified utterance was selected for each sentence/speaker/emotion combination. For one actress, many utterances were recognized either below chance or were relatively not well recognized compared to the other actors – therefore, only the utterances from three actresses and the two actors were selected. This resulted in 5 actors x 2 sentences x 9 emotions = 90 utterances. Even within this small selection, acoustic measurements showed that the way emotions were expressed by the different actors varied widely.
Currently, these utterances are being further validated in a laboratory in Lyon (Auditory Cognition and Psychoacoustics) with the aim of testing emotion perception in speech among amusic participants. These participants have difficulties processing pitch information in music; hence, we hypothesize that they will also have difficulty processing emotion in speech prosody, which is heavily reliant on pitch information.
This project was an essential foundational step to research such as this amusia project because a validated set of utterances expressing different emotions such as this previously only existed in other languages and in pseudo-French.
Second language knowledge on perception of emotion
The goal of this section of the project was twofold: first and foremost it was the first time the impact of second language ability on emotion recognition was evaluated within a single population, and second, it connected the project with ongoing cross-linguistic research in the laboratory. We tested French speakers with varying levels of knowledge of English on their recognition of emotions in stimuli from the corpus “Vocal Expressions of Nineteen Emotions across Cultures”: American English emotional prosody, American nonverbal vocalizations (here called “affect bursts”), and pseudo-French (from the Geneva Multimodal Emotion Portrayals). We examined the relation between their self-reported English ability and their recognition and ratings of emotions in English prosody and affect bursts. We found that English ability was only related to emotional prosody, and only for positive emotions: lower recognition accuracy was correlated with better English ability. We attribute this surprising finding to an interference between the semantic content of the utterance and the emotion, which may be higher for participants with better English ability. Only positive emotions were affected because they have been previously shown to be less universal than negative emotions, that is, more susceptible to cross-cultural differences. Therefore, recognition of positive emotions may be in general less stable. A paper on this section of the project is currently in press at PLOS One.
Emotion in music and speech: An EEG study
This section examined automatic recognition of speech prosody using an implicit measure and compared music and speech to each other in order to explore universal aspects of auditory emotion. Two professional improvisational musicians (a clarinetist and a flutist) performed two tasks: first, they imitated the pitch and rhythm of angry and happy speech utterances selected from the recordings described in the first section of this report, and second, they improvised their own happy and angry speech melodies. A pilot test showed that the imitations of speech were not able to communicate emotion effectively, but the improvisations were, perhaps because a musical instrument cannot reliably imitate all of the aspects of speech that allow it to communicate emotion.
These improvised melodies and the speech stimuli were then used in an EEG study to ask the question of whether the brain processes emotion from these two types of stimuli in similar ways. Participants heard happy or angry speech or music, and 500 ms later saw a happy or angry face. Brain responses from congruent (i.e. matching emotion, both happy or both angry) and incongruent pairs were compared. The response to the incongruent trials was more negative around 400-600 ms (i.e. an N400 effect) for the happy speech (see Fig. A, showing the response from a fronto-central electrode) but not for the angry speech trials (Fig. B). There was no congruency effect for music, though the responses between 400-600 ms were overall more negative in fronto-central regions for happy than for angry music, showing that participants differentiated between the two emotions.
The lack of effect for the angry speech could be because the angry speech itself affected the participants’ arousal levels, masking any congruency response. The lack of effect for the music is more difficult to pin down, given that the participants did differentiate between the two emotions of music. Perhaps the emotion was not expressed as clearly as in the speech, so the (in)congruency was never salient enough to result in brain activity differences. Currently, a paper on this section of the project is in preparation.
Final report
International Incoming Fellowships (IIF)
FP7-PEOPLE-2011-IIF
Grant agreement number 302452
“Emotional prosody in Speech: The importance of pitch, timing, and loudness”
This work was carried out in the Laboratoire Psychologie de la Perception in Paris, France (http://lpp.psycho.univ-paris5.fr). The main goals of this project were to better understand perception of emotional prosody on its own and to put it in a more universal context by comparing it with emotional expression in music. Thus, the project comprised three major sections: First, we developed a validated set of emotional utterances in French. Second, we tested the effect of second language (English) knowledge in French speakers on perception of American prosody. Third, we created a set of musical stimuli expressing anger and joy and tested these along with equivalent speech stimuli in an EEG study.
Validated emotional utterances
The goal of this section of the project was to establish a set of stimuli that can be used by French researchers to study emotional prosody. Six actors (four female, two male) were asked to record two sentences with eight different emotions (anger, disgust, joy, sadness, interest, pride, relief, and fear) and without emotion (neutral). The two sentences were "L'avion est presque plein" (The airplane is almost full) and "J'espère qu'il va m'appeler bientôt" (I hope he will call me soon). The recordings were done in a soundproof booth using Cool Edit Pro.
Thirty-one (23 F, 8 M) monolingual French-speaking participants (age range 18-28, M = 22) were presented with 388 utterances and asked to select which of the eight emotions was being conveyed by the utterance.
The most consistently correctly classified utterance was selected for each sentence/speaker/emotion combination. For one actress, many utterances were recognized either below chance or were relatively not well recognized compared to the other actors – therefore, only the utterances from three actresses and the two actors were selected. This resulted in 5 actors x 2 sentences x 9 emotions = 90 utterances. Even within this small selection, acoustic measurements showed that the way emotions were expressed by the different actors varied widely.
Currently, these utterances are being further validated in a laboratory in Lyon (Auditory Cognition and Psychoacoustics) with the aim of testing emotion perception in speech among amusic participants. These participants have difficulties processing pitch information in music; hence, we hypothesize that they will also have difficulty processing emotion in speech prosody, which is heavily reliant on pitch information.
This project was an essential foundational step to research such as this amusia project because a validated set of utterances expressing different emotions such as this previously only existed in other languages and in pseudo-French.
Second language knowledge on perception of emotion
The goal of this section of the project was twofold: first and foremost it was the first time the impact of second language ability on emotion recognition was evaluated within a single population, and second, it connected the project with ongoing cross-linguistic research in the laboratory. We tested French speakers with varying levels of knowledge of English on their recognition of emotions in stimuli from the corpus “Vocal Expressions of Nineteen Emotions across Cultures”: American English emotional prosody, American nonverbal vocalizations (here called “affect bursts”), and pseudo-French (from the Geneva Multimodal Emotion Portrayals). We examined the relation between their self-reported English ability and their recognition and ratings of emotions in English prosody and affect bursts. We found that English ability was only related to emotional prosody, and only for positive emotions: lower recognition accuracy was correlated with better English ability. We attribute this surprising finding to an interference between the semantic content of the utterance and the emotion, which may be higher for participants with better English ability. Only positive emotions were affected because they have been previously shown to be less universal than negative emotions, that is, more susceptible to cross-cultural differences. Therefore, recognition of positive emotions may be in general less stable. A paper on this section of the project is published at PLOS One (Second language ability and emotional prosody perception; doi: 10.1371/journal.pone.0156855).
Emotion in music and speech: An EEG study
This section examined automatic recognition of speech prosody using an implicit measure and compared music and speech to each other in order to explore universal aspects of auditory emotion. Two professional improvisational musicians (a clarinetist and a flutist) performed two tasks: first, they imitated the pitch and rhythm of angry and happy speech utterances selected from the recordings described in the first section of this report, and second, they improvised their own happy and angry speech melodies. A pilot test showed that the imitations of speech were not able to communicate emotion effectively, but the improvisations were, perhaps because a musical instrument cannot reliably imitate all of the aspects of speech that allow it to communicate emotion.
These improvised melodies and the speech stimuli were then used in an EEG study to ask the question of whether the brain processes emotion from these two types of stimuli in similar ways. Participants heard happy or angry speech or music, and 500 ms later saw a happy or angry face. Brain responses from congruent (i.e. matching emotion, both happy or both angry) and incongruent pairs were compared. The response to the incongruent trials was more negative around 400-600 ms (i.e. an N400 effect) for the happy speech (see Fig. A, showing the response from a fronto-central electrode) but not for the angry speech trials (Fig. B). There was no congruency effect for music, though the responses between 400-600 ms were overall more negative in fronto-central regions for happy than for angry music, showing that participants differentiated between the two emotions.
The lack of effect for the angry speech could be because the angry speech itself affected the participants’ arousal levels, masking any congruency response. The lack of effect for the music is more difficult to pin down, given that the participants did differentiate between the two emotions of music. Perhaps the emotion was not expressed as clearly as in the speech, so the (in)congruency was never salient enough to result in brain activity differences. Currently, a paper on this section of the project is in preparation.
Overall, this project highlighted the complexity of emotional prosody perception. In the first section, we found large acoustic variability even within a small selection of accurately-identified utterances. In the second section, we found that contrary to intuition, increased second language (L2) ability hindered rather than helped with emotion identification in that L2. And in the third, we showed that an N400 congruency response can vary according to the base emotions being conveyed.