Periodic Reporting for period 3 - ECOLANG (Ecological Language: A multimodal approach to language and the brain)
Reporting period: 2021-01-01 to 2022-06-30
ECOLANG studies language learning and comprehension in the real-world, investigating language as the rich set of cues available in face-to-face communication. We ask whether and how do children and adults use multimodal cues to learn new vocabulary and to process known words. We further ask how the brain integrates the different types of multimodal information during language learning and comprehension.
Using a real-world approach to language learning and comprehension provides key novel insights that can enhance treatments for developmental and acquired language disorders. It also provides novel constraints for automatic language processing, leading to improved performance by automatic systems in learning and processing language and in interacting with humans.
We have begun assessing the interaction between linguistic predictability (measured as linguistic surprisal based on n-gram or RNN models) and multimodal cues such as prosodic modulation and presence of gestures. We ask whether word surprisal predicts whether speakers would produce a prosodic modulation (we focused on word duration), expecting to see longer durations for words that are less predictable in the preceding context. We also looked at representational gestures (gestures that imagistically refer to what is being talked about) expecting to see that speakers produce more gestures for more surprising words. This is precisely what we observed (Grzyb, Vigliocco & Frank, in prep). Overall this work indicates the potential of using computational models to assess the interdependence between the use of specific words and multimodal cues such as gestures or prosodic modulation.
In behavioural studies, we developed quantification of informativeness of gestures, and especially mouth movements to assess the impact of multimodal cues such as gestures and mouth movements and their interactions on word recognition. We have found that we found that gesture informativeness and mouth movements speed up word recognition (Krason, Fenton & Vigliocco, in prep). In electrophysiological work using naturalistic stimuli, we have investigated whether the presence of multimodal cues (and their combination) modulate a biomarker of processing difficulty (N400). We have found that all the multimodal cues we investigated (representational gestures, beats, prosodic stress and mouth movements) affect the processing of words for first and second language users indicating that these cues are central to language processing. We also found that their impact dynamically changes depending upon their informativeness and finally, we found a hierarchy: prosody shows the strongest effect followed by gestures and mouth movements (Zhang, Frassinelli, Tuomainen & Vigliocco, 2020). Thus, these studies provide a first snapshot into how the brain dynamically weights audiovisual cues in language comprehension.
Our starting point is the development of a corpus of dyadic communication between an adult and a child, or between two adults. To our knowledge, this will be the first naturalistic annotated corpus that comprises both adult-to-adult and adult-to-child conversations that manipulate key aspects of the context. These manipulations are expected to bring about different weighted combinations of the cues, e.g. will visible cues (gesture, mouth movements), in addition to prosody, be more prominent in language toward a child than in adult directed language? Is gesture different when objects are present (pointing to the objects) vs when they are absent (gestures iconic of referent properties)? We will also develop initial computational models of how multimodal cues are combined in spoken language, tested against results of behavioural and electrophysiological studies investigating whether and how the multimodal cues affect language learning and processing. Finally, we expect to obtain some of the very first evidence concerning how neural networks are orchestrated in multimodal language combining fMRI and patients’ studies.