Speech Prosody in Interaction: The form and function of intonation in human communication

Projektinformationen

SPRINT

ID Finanzhilfevereinbarung: 835263

DOI

10.3030/835263

Projekt abgeschlossen

EK-Unterschriftsdatum 7 Mai 2019

Startdatum 1 Oktober 2019

Enddatum 30 Juni 2025

Finanziert unter

EXCELLENT SCIENCE - European Research Council (ERC)

Gesamtkosten

€ 2 481 196,00

EU-Beitrag

€ 2 481 196,00

2 481 196,00

Koordiniert durch

STICHTING RADBOUD UNIVERSITEIT
Netherlands

Periodic Reporting for period 4 - SPRINT (Speech Prosody in Interaction: The form and function of intonation in human communication)

Berichtszeitraum: 2024-04-01 bis 2025-06-30

An important yet often overlooked aspect of speech is its melody, aka intonation, the systematic modulation of voice pitch for linguistic purposes. A better understanding is important because intonation plays a critical role in communication. It helps us organize speech into phrases, and is also used to highlight important words and convey nuances about our intended meaning. The importance of intonation becomes clear when it “malfunctions” during communication. This can happen, for example, when a listener fails to detect irony, or a learner transfers a melody from their native tongue to their second language, where the melody is used differently. As our research shows, such misapplication of intonation is also characteristic of AI speech, which can imitate the melodies used by humans but use them in the wrong context.

It is not possible to remedy such miscommunications until we have a good understanding of intonation’s structure and functions but, as noted, intonation has been overlooked. This has led to either neglecting intonation (e.g. studying conversation without considering intonation) or treating it as an acoustic signal without considering its linguistic structure. The aim of SPRINT has been to address the challenges that intonation variability poses, by documenting variability and advancing new methodological principles for handling it, with a view to developing a comprehensive theory of intonation structure and meaning.

SPRINT focused primarily on speech production, where intonation variability is considered most challenging. To address this challenge, we adopted established statistical modelling techniques, that had not been previously used for the study of intonation, and applied them to conversational English and Greek. We showed that with the right methodologies, not only is it possible to study the intonation of spontaneous speech but that doing so is critical for understanding complex intonational phenomena. We further tested our findings by conducting comprehension experiments.

A main conclusion from SPRINT is that intonation behaves similarly to other speech phenomena that pertain to consonants and vowels (segments). Intonation categories have distinct phonetic realizations which show some overlap, so that the same phonetic realization can belong to different categories. This, however, is not a problem for comprehension: although pitch is the main phonetic exponent of intonation, additional cues (e.g. loudness and duration) are also used, a phenomenon known as cue-redundancy. Further, some redundant cues are used by speakers to compensate for the lack of optimal pitch use, a phenomenon known as cue-trading. Overlap, cue-redundancy, and cue-trading are well-established features of segments. Their use in intonation provides evidence that intonation and segmental categories are comparable. We further found that the phonetic detail and pragmatic function of intonation categories may be more salient to some speakers, based on their levels of empathy, musicality, and autistic-like traits. The interplay between all these factors can lead to speakers of a language developing slightly different intonation systems. Our figure illustrates this point: in English, speakers barely differentiate between highlighting words that present contrastive information from those that are simply new (e.g. the pitch of "black" is virtually the same whether speakers say "A black coffee, please", where "black" is new, or "no, I want BLACK coffee (not a cappuccino)", where "black" is contrastive). Greek speakers, on the other hand, differentiate the two using a fall for new and a rise-fall for contrastive information.

Overall, our research has helped us understand how intonation is structured and has led to the development of new methodological approaches that can become a blueprint for others to follow, further advancing the study of intonation. Such advancement can lead to breakthroughs in the teaching of intonation in second language acquisition, and the development of more natural synthetic voices.

Since the beginning of the project, spoken data have been collected in Canterbury and Oxford in the UK, and in Athens, Greece. These data have been analysed using techniques that allowed us to filter out random variability and document essential differences between the linguistic categories that are the building blocks of intonation. This has been followed by comprehension studies using a variety of paradigms and includes the examination of individual variation and its potential sources. Taken together, our studies have shed light onto the structure of intonation and its function in conversation. Finally, part of our work has focused on developing new methodologies, such as the use of images instead of the acoustic signal for studying intonation.

The project’s findings have been presented at 24 international conferences, the majority of which are attended by both the academic community and industry. The SPRINT team have also given 18 invited talks, and have published 19 papers (with several more in the pipeline). Further, we have disseminated our methods and findings in two highly popular workshops, a tutorial, a special session on Greek prosody, and several review papers. Finally, our activities have been disseminated through social media and the project's website.

In SPRINT we took a novel approach to collecting, annotating and analysing intonation. First, we collected and studied natural speech, which had been previously largely avoided, being considered too difficult. Our results clearly demonstrate that studying spontaneous speech is possible and worthwhile. Further, our analytical approach breaks down the task of annotation (the tagging of data for further analysis) and allowed us to gain new insights; e.g. we have shown that in British English important words are highlighted with both falls and rise-falls in pitch (contra the standard British approach that only falls are attested), but that this distinction in form does not correspond to a distinction in function (contra American approaches which maintain a clear link between form and function). Our novel practices have the potential to lead to a renewed interest in intonation research and to more insightful results. Additionally, we have used statistical and computational methodologies that, although established in other fields, have rarely been used for the study of intonation before. By doing so, we have demonstrated that studying variability in intonation is feasible and desirable, as it leads to more insightful answers. Finally, we have demonstrated the desirability of triangulating conclusions from production by seeking support from comprehension. By examining the same problem from several perspectives, we have been able to provide comprehensive answers to disputes that have plagued the field for decades. As our results are promising, they could become part of the standard toolkit for intonation research. Although the project has now ended, there are several studies that are in the process of being prepared for publication; these are likely to, at the very least, consolidate the current insights from SPRINT and potentially lead to additional innovations.

Differences in the intonation used in English and Greek to highlight particular words

Periodic Reporting for period 4 - SPRINT (Speech Prosody in Interaction: The form and function of intonation in human communication)

Herunterladen Den Inhalt der Seite herunterladen