Skip to main content

LAUGHTER IN CONVERSATION: ENHANCING THE NATURALNESS OF DIALOGUE SYSTEMS

Periodic Reporting for period 1 - HA-HA (LAUGHTER IN CONVERSATION: ENHANCING THE NATURALNESS OF DIALOGUE SYSTEMS)

Reporting period: 2018-11-01 to 2020-10-31

With smartphones and computers being an integral component of nowadays life, people interact more and more with their devices. Although recently there have been some important advances in speech technology, such as better automatic recognition and synthesis, in terms of conversational aspects, automatic systems still do not exploit the richness of cues found in human communication. In this project we explored how one of the most encountered non-verbal vocalisations in human conversations, laughter, may be integrated in a spoken dialogue system. In order for this to be achieved, we first performed a quantitative analysis of laughter use in human interaction, by taking into account factors such as the level of analysis or the influence of the conversational partner. Next, we studied acoustic cues able to discriminate laughter from speech and developed an automatic laughter detection system based on these. Finally, we conducted a perceptual experiment with adult participants asking them to judge the outcome of conversation between a virtual agent and a person, when the agent employed laughter or not.
In the first part of the project we focused on the use of laughter in conversation, showing that laughter onset may be anticipated before its manifestation, by means of spectral changes affecting the syllable preceding laughter, this finding having potential applications for the automatic detection of laughter. Next, a quantitative analysis of laughter was performed, by examining how laughter is distributed across two linguistic organization levels (utterance and speaker turn), across three languages. This analysis gave us insights into the cross-linguistic (or cross-cultural) perspective of laughter use, suggesting a more universal way of using laughter at the utterance level than at the turn level. Finally, since conversational partners tend to influence each other during their interaction, resulting in them becoming more similar (phenomenon called entrainment), we investigates whether this phenomenon occurs also for laughter. Performing analyses at the laughter token and turn-level, we observed temporal entrainment at the laughter-token level, in how speakers of a dialogue distribute their laughter events throughout the conversation and at the turn level, by aligning their laughter more with the beginning and the end of their turns (and more so towards the end of the conversation than at its beginning).

The second part of the project saw work done on the investigation of features for laughter discrimination from speech and the development of an automatic method for laughter detection. As previous laughter studies have shown that humans rely on rhythm information for the perception of laughter, we examined two rhythm representations based on the modulation of the speech signal. The analysis revealed that the two representations, encoding the variation of the envelope of the signal and its temporal fine structure, respectively, may discriminate between laughter (laughs and speech-laughs) and speech. We then used this information to develop an automated method for laughter detection which receives in input the speech signal, computes its modulation spectrum and determines, based on this representation, the time intervals where laughter may occur. This automated method was then integrated in a semi-automatic laughter annotation procedure, by limiting the manual annotation procedure to the time intervals returned by the automatic tool. An evaluation of the proposed semi-automatic procedure showed that it decreased, on average, the time required for annotation by 30%, while keeping a similar inter-rater reliability as the fully manual procedure, at a cost of 10% missed laughter events.

An online perceptual experiment was designed for the third part of the project, in which the participants watched a conversation between a real-estate agent and a client while visiting an apartment. The speech belonging to the dialogue system-based agent were synthesized using a state-of-the-art system and we manipulated the presence of social laughter in the real-estate agent's voice. The participants were asked to judge the interaction of the agent with the client, as well as to evaluate the agent based on several dimensions, such as professionalism or pleasantness. We also considered a control condition, in order to test whether the participants show the same differences for both a virtual and a human agent, between the no-laughter and the laughter-enhanced conditions.

The results of the project were disseminated in the form of five published proceedings articles in highly relevant conferences and workshops, one accepted conference abstract, as well as one submitted journal article, and one article in preparation. Moreover, we organized the sixth edition of the Workshop on Laughter and Other Non-Verbal Vocalisations, thus further increasing the visibility of the project in the scientific community. The scope and the activities of the project have also been communicated through non-scientific actions, such as interviews for the university's blog and the MSCA Fellow of the Week programme. In terms of exploitation of results, we agreed on a collaboration with colleagues from the private sector for testing our measures of speech entrainment in a business setting, with the possible development of a tool for the automatic analysis of business relationships. Finally, the developed semi-automatic laughter annotation tool was made available online, for any interested parties.
The work performed here sheds further light on the use of laughter in human communication, especially on the effect that the conversational partners have on each other during interaction. These findings would have to be taken into account in any dialogue model or dialogue manager module (in a spoken dialogue system), for a naturalistic interaction. Their implementation in social-conversational virtual agents may result in a greater naturalness of the interaction and an increase in their acceptance by populations that could benefit from the use of this technology (such as residents of retirement homes). The acceptance of a technology does not depend only on its characteristics (for instance, its naturalness or its ease of use), but also on its users accepting a particular human-like feature in their conversation with a machine, issue explored in the last part of the project. The potential socio-economic impact of the project includes the aspect discussed in the exploitation of the results - an increased innovation capacity through the development of automatic evaluation tools.
Aspects examined by the project