European Commission logo
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

Audiovisual Speech Segmentation and Oscillations

Periodic Reporting for period 1 - AVISSO (Audiovisual Speech Segmentation and Oscillations)

Periodo di rendicontazione: 2016-05-02 al 2018-05-01

Speech is often multimodal and listeners need to integrate visual prosody conveyed by a speaker’s body expressions together with prosody, which appears to involve temporal coordination. Listeners may rely on the integration of temporal prosodic structure conveying information at delta frequency (1-3Hz) in both modalities to segment inputs and facilitate audiovisual speech processing. How the brain translates naturally AV rhythms to facilitate speech segmentation was the central aim of this proposal. The project investigated the oscillatory correlates of audiovisual speech relying on the temporal integration of body movements associated with suprasegmental features in speech (audiovisual prosody). Besides investigating the neural basis of AV speech processing, this project relates to a wider range of relevant areas (e.g. communication), and its possible outcomes can influence the rehabilitation of speech dysfunctions in clinical populations suffering from temporal processing deficits (e.g. Parkinson disease, stroke). The overall objectives of the project were to establish how the precise temporal alignment between visual and auditory rhythmic features conveyed by prosody, and its neural integration, contribute to successful AV speech processing. I adopted a multimodal approach to investigate the oscillatory neural correlates supporting prosodic features integration at delta (WP1), and the potential implication on the temporal-processing network (i.e. pre Supplementary Motor Area, Basal ganglia, and cerebellum) supporting prosodic-based multimodal speech integration (WP2).
Work completed:
- Implementation of the experimental paradigms (including task design, creation and rating of the stimuli, and programming of the task for the EEG and fMRI studies).
- EEG data acquisition and analyses.
- fMRI piloting.
- Collaborations with colleagues on different related projects.
- Student supervision, tutoring of Problem-Based lectures, grading.
- Preparation and submissions of manuscripts for publication.
- Securing a 4-year postdoctoral fellowship (Sir Henry Wellcome postdoctoral fellowship, UK), following the present fellowship.

Main results:
The main results of the project can be summarized as follows: (1) When semantic content was degraded, listeners successfully relied on corresponding visual and auditory prosodic features to decide whether the two modalities were presented synchronous or asynchronous. In contrast, when lip movement information was removed with a blurred mask over the speaker’s face, listeners were still able to match visual and auditory modalities but only when the audiovisual temporal alignment was intact. These results established that listeners not only rely on visual and auditory correspondence at a syllabic time scale, but also extract the temporal structure conveyed at the slower delta time scale by prosodic information to successfully process multimodal speech. (2) At neural level, we found a specific increase of delta (1-3Hz) power when visual and auditory modalities where presented in asynchrony as compared to synchrony. This increase of power might reflect an increase of difficulty for the listeners to extract the temporal structure of prosody when there is a misalignment between visual and auditory inputs of speech. (3) In line with our hypothesis, the increase in delta power was found over a left fronto-central sensor area, which may reflect the convergence of visual and auditory delta information to generate high-level temporal predictions to improve audiovisual speech processing (i.e. engaging the supplementary motor area, SMA).

Exploitation and dissemination:
Conference Presentations:
1.Branzi F. M., Biau E., Martin C.D & Costa A. (2017). Bilingual lexical access is triggered bythe intention to speak: behavioral and ERP/ EEG evidence. Dutch Neuroscience meeting, June 15-16th, Lunteren (The Netherlands). Poster Presentation.
2.Branzi F. M., Biau E., Martin C.D & Costa A. (2017). Bilingual lexical access is triggered by the intention to speak: behavioral and ERP/EEG evidence. Cognitive Neuroscience Society (CNS) Annual Meeting, March 25-28th, San Francisco (USA). Poster Presentation.
3.Biau E. (2017). Beat gestures in audiovisual speech: Prosody extends to the speaker’s hands. Invited speaker at the Max Plank Institute of Nijmegen, Gesture Centre (Netherlands).
4.Biau E. (2016). Beat gestures and speech processing: When prosody extends to the speaker’s hands. SNL (London), sensorimotor speech symposium at the UCL. Talk.

Manuscripts:
1. Biau, E., Fromont, L., & Soto-Faraco, S. (2017). Beat gestures and syntactic parsing: An ERP study. Language Learning.
2. Fromont, L.A. Soto-Faraco, S., & Biau, E. (2017). Searching high and low: Prosodic breaks disambiguate relative clauses. Frontiers in Psychology, 8:96.
3. Biau, E. & Kotz, S. A. Lower beta: a central coordinator of temporal prediction in multimodal speech (in review).
4. Schwartze, M., Brown, R.M. Biau, E., & Kotz. Timing the “magical number seven”: effects of temporal structure on verbal working memory (in review).
5. Biau, E., Gunter, T., & Kotz, S. A. Audiovisual speech processing relies on multimodal prosody integration (in preparation).
6. Schultz, B. G., Biau, E., & Kotz, S. A. Frame rate control during audiovisual presentation in EEG paradigms: a new toolbox. (in preparation).
7. Biau, E., Schultz, B.G. Schwartze, M., & Kotz, S.A. Mind the Gap: oscillatory correlates of temporal predictions across modalities (in preparation).
8. Schultz, B.G. Biau, E., Schwartze, M., & Kotz, S.A. Oscillatory correlates of temporal predictions across modalities in a sensorimotor task (in preparation).
Audiovisual speech processing supports normal conversations, and its impairment can dramatically affect daily social interactions of patients. How the brain takes advantage from rhythmic features conveying prosodic information to facilitate the temporal integration of signal structure and correct audiovisual speech processing remains unclear. However, deficits in speech perception may relate to temporal-processing functions rather than purely sensory processing. Combining adapted behavioral and neuroimaging techniques together, we established that audiovisual speech processing based on the temporal alignment between visual and auditory prosodies was correlated to power modulations of the corresponding delta oscillatory activity over left fronto-central EEG sensor areas. This result is in line with recent studies supporting the role of left pre SMA area and timing-processing network in multimodal speech processing. If so, the brain may extract the temporal structure of prosody conveyed at 1-3Hz from sensory areas and converge congruent information in the left SMA areas to generate temporal predictions and facilitate incoming speech processing (and multimodal integration). We expect to confirm these results soon with fMRI and to reveal the critical role of the temporal processing network (i.e. SMA, basal ganglia, cerebellum) in audiovisual speech perception. Once done, we will recruit Parkinson disease patients in the fMRI version of the experiment, and compare their behavioral/neural responses to the actual ones (WP1/WP2). For this reason, the results from this project will be particularly useful for guiding current and future interventions for the rehabilitation of Parkinson disease patients, potentially based on temporal-processing deficit compensation.
Time-Frequency representations between contrasts
Topographical representations of significant clusters
behavioral results
experimental paradigm and conditions