Skip to main content

The neural basis of visual interaction between scenes and objects

Periodic Reporting for period 1 - SEEING FROM CONTEXT (The neural basis of visual interaction between scenes and objects)

Reporting period: 2016-06-01 to 2018-05-31

When we see a boat on the water we can immediately recognize it even if it is far away or unclear on its own. What helps us identify such ambiguous visual elements is the context within which they are presented. If we could not see the water, but only the blurry distant boat, we would be less likely to recognize it. The main objective of SEEING FROM CONTEXT was to characterize the neural mechanisms underlying such interactions between scene and object visual processing of real-world scenes. Thereby, this project addressed a cornerstone challenge in vision neuroscience, of how the brain generates coherent perception of complex visual stimuli even under sub-optimal viewing conditions such as distance, fog, darkness, etc. Particularly, we used fMRI to test whether scene-object contextual interaction takes place within areas specialized for scene processing and object processing, and we used MEG to test when, along the time-course of the neural response, scene-object integration takes place.
1. Do global and local contextual cues shape the neural representation of objects and scenes? The first part of SEEING FROM CONTEXT examined whether representations ambiguous objects in object-processing brain regions are shaped by scene context, and whether representations of ambiguous scenes in scene-processing brain regions are shaped by object cues. In the fMRI, we presented participants with images of degraded objects with or without scene context (n=19), or with images of degraded scenes with or without an intact object (n=18). We then used a computational approach of multivariate pattern analysis (MVPA) to extract the representations of object and scene category across the brain. We found that scene context shapes the representations of objects within the object-selective pathway (Brandman and Peelen, 2017, J Neurosci), and that object cues shape the representations of scenes within the scene-selective pathway (Brandman and Peelen, in press, J Cog Neurosci). The results of these studies provide the first direct evidence for the interaction of scene- and object-selective pathways in the brain, suggesting that they are not isolated visual modules, but rather an interactive network of complementary pathways in the visual processing of real-world scenes. In addition, we found that object cues improved scene-selective representations only in the left hemisphere, whereas scene-selective representations in the right hemisphere, as well as in early visual areas, were unimproved by objects. These findings demonstrate separate roles for left and right scene-selective cortex in scene representation, whereby left scene areas represent inferred scene layout, influenced by contextual object cues, and right scene areas, as well as early visual areas, represent a scene’s visual features (Brandman and Peelen, 2018, J Cog Neurosci).
2. When do contextually-induced neural representation of objects and scenes emerge? The second part of SEEING FROM CONTEXT aimed to reveal how long it takes, from the moment we see the visual scene, to generate a representation of a feature that is contextually defined by the complementary stream. Thus, We used MEG with a similar paradigm as in the fMRI, to test the effects of scenes on the time-course of object representation (n=25) and the effects of objects on the time-course of scene representation (n=28). We found that in both cases, interactive processes peaked at around 320 ms after visual onset (Brandman and Peelen, 2017, J Neurosci; Brandman and Peelen, in preparation). This timing is 100 ms later than peak representation of intact isolated objects. Taken together with the fMRI results, this suggests a longer route for interactive scene-object processing, in which visual information is processed along both pathways and then projected onto the complementary pathway, resulting in a delayed sharpening of the representation.
These data, together with online behavioral data collected for these experiments (n>100), have been presented in international conferences around the world, and published in top journals of the field. We are currently working towards promoting a new approach to the dual-pathway concept of object and scene processing, via a review article. Our findings also gave rise to two follow-up questions, leading to two additional studies. In one study we asked whether scene-object contextual integration is an automatic process, or whether it is gated by attention. This was tested in the MEG using a similar paradigm as in the previous experiments, with an additional manipulation of attention, and is currently under analysis. In a second study we asked whether object representations are shaped not only by contextual visual information, but also by external non-visual information. We therefore tested the effects of auditory and semantic input on visual representation of objects in the MEG. We found that both words and natural sounds facilitated the representations of objects, and that words were more effective facilitators than natural sounds, implying that they engage separate routes in facilitation of visual perception.
Altogether, findings achieved by SEEING FROM CONTEXT characterize, in space and time, functional interactions between scene- and object-processing pathways. In doing so, these data provide direct evidence for the interactive nature of object and scene processing already at a perceptual stage, i.e. within visual processing. Importantly, by advancing our understanding of these contextual integration processes, our findings provide a clear answer to the long-standing debate on whether such visual pathways are encapsulated or interactive. The concept of perceptual modularity stipulates that distinct perceptual modules, such as the object- and scene-selective pathways, operate in parallel and are functionally isolated from one another. SEEING FROM CONTEXT acts as a game changer in the field, challenging the functional isolation model and redefining object and scene modularity as two distinct yet complementary streams that operate in tandem, actively communicating information at a perceptual (visual) stage of processing. Broadly, our findings suggest that object and scene representations in the visual cortex do not faithfully reflect the physical detail present in our visual field, but rather reflect inferences, based on implicit knowledge of scene-object co-occurrences, as well as visual-auditory co-occurrences, in our daily-life environments.