Skip to main content

Neural correlates of predictive mechanisms in multisensory perception

Final Report Summary - PREDICTIVENEUROSENS (Neural correlates of predictive mechanisms in multisensory perception)

In many real-life situations, what we see (hear) is strongly affected by what we hear (see) and the integration of information across the senses can impair, improve or create novel perception. How does the brain integrate information across the different sensory modalities? PredictiveNeurosens specifically focused on how auditory and visual information gets integrated in the Human brain. Three major questions in this project emphasized on (i) understanding the constraints by which auditory and visual information get automatically integrated, (ii) providing a better comprehension of multisensory contingencies that could lead to supramodal operating mode of brain processes and (iii) providing new venues for clinical applications.

1. Automaticity in multisensory integration
In natural environments, sensory information is embedded in temporally contiguous streams of events. This is typically the case when seeing and listening to a speaker [1] or when engaged in scene analysis. In such contexts, two mechanisms are needed to single out and build a reliable representation of an event (or object): the temporal parsing of information and the selection of relevant information in the stream. It has previously been shown that rhythmic events naturally build temporal expectations that improve sensory processing at predictable points in time.
In a first study [2], we specifically focused on the temporal predictability afforded by sounds in the context of a visual search task and asked to which extent temporal regularities can improve the detection and identification of events across sensory modalities. To do so, we used a visual dynamic conjunction search task accompanied by auditory cues synchronized or not with the color changes of the visual target (horizontal or vertical bar). We showed that sounds synchronized with the visual target improved search efficiency for temporal rates below 1.4 Hz but did not affect efficiency above that stimulation rate. Conversely, desynchronized auditory cues consistently impaired visual search. These results suggest that a cognitive operation structures events in time irrespective of the sensory modality of input; crucially, such temporal structuring benefits visual discrimination. These results support and specify recent neurophysiological findings by showing strong temporal selectivity in audiovisual integration; additionally, these results provide a first insight on the temporal and attentional constraints for automaticity in multisensory integration.
2. Supramodal processing
Increasing evidence for multisensory integration throughout cortex has challenged the view that sensory systems are strictly independent. This, in turn, questions the innate specialization of sensory cortices: for instance, in congenitally blind humans, hMT+ (the human homolog of MT/V5 in monkeys, an area responsive to visual motion) undergoes functional recycling for auditory or tactile processing. In other words, some cortical areas may naturally be capable of functional selectivity irrespective of the sensory modality of inputs, hence of functional recycling.
In this study [3], we asked whether learning to discriminate difficult visual patterns would benefit from displaying additional auditory information. To test this hypothesis, three groups of participants were shortly trained to discriminate which of a red or green intermixed population of random-dot-kinematograms (RDKs) was most coherent in a visual display while being recorded with magnetoencephalography (MEG). During their training, participants could either perform the task without sound (visually trained group), or hear congruent acoustic textures (audiovisual trained group) or auditory noise (control group). Importantly for our hypothesis, the acoustic textures heard by the audiovisual trained group shared the temporal statistics of visual RDKs: the auditory and visual information were coherent in time.
After training, we found that the audiovisual trained group significantly outperformed participants trained in visual conditions but also outperformed the control group. Although participants in the audiovisual group benefited from auditory inputs, they were not aware of their progress suggesting that learning in this task was implicit and did not massively engage attentional processing. Before and after training, all participants underwent an evaluation on the same set of RDKs and without sound. When contrasting the MEG data collected in these experimental blocks, selective differences were observed in the dynamic pattern and the cortical loci responsive to the presentation of visual RDKs. The learning history therefore impacted how the brain subsequently analyzed visual stimuli. Specifically, first and common to all three groups, the ventrolateral prefrontal cortices (vlPFC) showed selectivity to the learned visual coherence levels whereas selectivity in visual motion area hMT+ was only seen for the group trained in audiovisual conditions. Second and solely observed in the audiovisual group, activity in multisensory cortices (sp. middle and posterior superior temporal sulcus) correlated with post-training performances. Additionally, the latencies of these effects suggest a feedback from vlPFC to hMT+ possibly mediated by temporal cortices in the audiovisual trained groups.
Altogether, our results are interpreted in the context of the Reverse Hierarchy Theory of learning (Ahissar and Hochstein, 2004) in which supramodal processing optimizes visual perceptual learning by capitalizing on sensory-invariant representations. In other words, the brain optimizes learning on a visual task by making use of available auditory information even when the relationship between the two sources of information is relatively abstract - here, the global spectrotemporal coherence levels across sensory modalities.

3. Schizophrenia
In a third study, we focused on the reported impairments in multisensory integration of patients with schizophrenia [4]. Different experimental work has provided evidence that sensory binding and the temporal structuring of events in patients with schizophrenia may be impaired. To address this issue, 26 patients and their matched controls took part in two studies using desynchronized audiovisual speech [1]. Two main tasks were used and compared, namely an identification task in which participants reported what they hear while looking at the face and a second task in which they judged the simultaneity of the auditory and visual speech stimuli. In both tasks, we used McGurk fusion and combination which are classic and ecologically valid multisensory illusions. First, our results suggest that patients do not significantly differ from controls in their rate of illusory reports contrarily to previous studies. Second, the illusory reports in the identification task were slightly more sensitive to audiovisual asynchronies in patients than in controls. Third, and surprisingly, patients considered audiovisual speech to be synchronized for longer audiovisual asynchronies than controls. As such, the temporal tolerance profile to audiovisual speech asynchrony in schizophrenia was less of a predictor than for controls. We interpret our results as an impairment of the structuring of events in schizophrenia which does not specifically affect speech processing but rather the explicit access to timing information associated with audiovisual speech processing.

Impact and use and any socio-economic impact of the project:
The results obtained in PredictiveNeurosens suggest that the temporal structure of multisensory features can profoundly affect the analysis of sensory informationin the brain. The use of sensory features that naturally map across sensory modalities provide a major step towards understanding the representation of multisensory invariance or supramodal/abstract objects in the brain. As such, practical implications of this research are foreseeable for the optimization of sensory substitution devices that make use of natural cross-sensory mapping in audition, somatosensation and vision (see for instance the seminal work of Bach-y-Rita and Kercel, 2003; Amedi et al., 2007).

[1] van Wassenhove V (2013) Speech through ears and eyes: interfacing the senses with the supramodal brain. Front. Psychol. 4:388. doi: 10.3389/fpsyg.2013.00388
[2] Kösem A, van Wassenhove V (2012) Temporal structure in audiovisual sensory selection. PLoS ONE 7(7): e40936. doi:10.1371/journal.pone.0040936.
[3] Zilber N, Ciuciu P, Gramfort A, Azizi L, van Wassenhove V (in press) Supramodal processing optimizes visual perceptual learning and plasticity. 2014 Feb 22.
[4] Martin B, Giersch A, Huron C, van Wassenhove V (2013) Temporal event structure and timing in schizophrenia: preserved binding in a longer “now”. NeuroPsychologia, 51, 358-371