Skip to main content

Characterizing neural mechanisms underlying the efficiency of naturalistic human vision

Periodic Reporting for period 3 - NATVIS (Characterizing neural mechanisms underlying the efficiency of naturalistic human vision)

Reporting period: 2020-09-01 to 2022-02-28

The efficient detection of goal-relevant objects in our environment is of critical importance in daily life. For example, the majority of road accidents are caused by insufficient attention to relevant objects (e.g. pedestrians) in scenes. Our daily-life visual environments, such as city streets and living rooms, contain a multitude of objects. Out of this overwhelming amount of sensory information, our brains must efficiently select and recognize those objects that are relevant for current goals. Visual and attention systems have developed and evolved to optimally perform real-world tasks like these, as reflected in the remarkable efficiency of naturalistic object detection. It is increasingly appreciated that the brain makes use of a wide range of available information to facilitate object detection in real scenes. This project aims to characterize the neural mechanisms that contribute to the efficiency of goal-directed naturalistic human vision.

The brain systems underlying the detection of task-relevant information in cluttered displays have primarily been studied using artificial and highly simplified displays. While these studies have been fundamentally important for revealing basic neural mechanisms involved in perception and attention, they fall short in fully explaining how the brain so rapidly detects familiar objects in complex but meaningful real-world scenes. A large body of behavioural work has shown that the detection of goal-relevant objects in real-world scenes is dramatically more efficient than the detection of targets in apparently much simpler artificial displays. The goal of this project is to understand why this is so, using psychophysics, fMRI, MEG, and TMS to improve our understanding of the neural mechanisms underlying the efficiency of object detection in natural scenes.
In the experiments performed so far, we have studied several factors that contribute to the efficiency of naturalistic object detection, as described in more detail below:

Interactions between scene and object processing
In naturalistic vision, objects and scenes are processed interactively, with scene information informing object recognition and object information informing scene recognition (e.g. seeing a boat supports the inference that the foggy scene is a lake). In behavioral, fMRI, MEG, and TMS studies we have revealed interactions between scene and object processing during the viewing of natural scenes. For example, using fMRI, we tested how object perception supports scene recognition (Brandman & Peelen, 2019). We examined the neural representation of scene category (indoor, outdoor) in scenes that were difficult to recognize on their own but easily disambiguated by inclusion of an object. We found that objects played an important role in the processing of real-world scenes. Specifically, our results showed that the representation of scene layout in scene-selective brain regions (PPA/OPA) was facilitated by contextual object cues. This effect was strongly left lateralized, demonstrating separate roles for left and right PPA/OPA in the representation of scenes, whereby left PPA/ OPA represented inferred scene layout, influenced by contextual object cues, while right PPA/OPA represented a scene’s global visual features. Together with other ongoing studies, we now start to better understand the extensive interactions between object- and scene-selective pathways in visual cortex.

Inter-object grouping based on real-world regularities
We seem to effortlessly recognize objects even when these objects are embedded in highly complex scenes. We recently proposed that this discrepancy is partially accounted for by perceptual adaptations to regularities in real-world environments (Kaiser et al., 2019), including the perceptual grouping of highly predictable constellations of objects (e.g. lamp above table). In one project (Quek & Peelen, submitted), we used EEG to investigate how identity-based and positional regularities interact during visual object processing. We found that the degree to which the visual system integrates identity information carried by concurrent objects critically depended on their relative position, an effect that was most pronounced over right occipitotemporal cortex and peaked around 300 ms after stimulus onset. These results provided evidence that the spatial and identity relations between objects influence object processing in an interactive manner, with these regularities jointly facilitating object perceptibility itself. That presenting objects in their typical relative positions enhanced the representation of their contextual association under these conditions sheds new light on visual adaptations for simplifying scene analysis, showing that multi-object grouping based on positional regularities serves to reduce competition between high-level stimuli.

Attention in real-world scenes
When searching for an object in our environment, we maintain a visual template of the object in memory, causing our visual system to favor template-matching visual input. Although template-based visual selection has been extensively studied in laboratory settings, where objects are presented in isolation (Figure 1, right panel), it remains an open question how this mechanism works in more complex scenes, where objects produce vastly different images on the retina depending on where they are located in the scene (Figure 1, left panel). For example, two objects that produce an image of the same size on the retina can be of vastly different sizes in the real world if one is nearby and the other one is farther away. In one project (Gayet & Peelen, 2019), we studied the interaction between top-down attentional set, maintained in working memory, and contextual modulation of object appearance. We presented objects within a naturalistic visual scene, at locations appearing either nearby or far away from the observer. As expected, subjects perceived far objects to be larger than near objects, even though both objects encompassed the same retinal size. Critically, objects automatically attracted attention when their perceived size matched the size of a memory template, compared with mismatching objects that had the same size on the retina. These findings demonstrate that memory templates impact the processing of visual input after object-scene integration has occurred, thus providing a mechanism for effective template-based visual selection under naturalistic viewing conditions.
We expect many more breakthroughs in the coming reporting period, with ongoing studies combining multiple methods to investigate the role of expectations, attention, and working memory on object detection in naturalistic scenes. Some other developments that we are excited about are: the creation of naturalistic scenes using architecture software, to allow for full experimental control, for example when manipulating relative object position; the use of short movies in which we create expectations about object appearance across time based on scene layout; the incorporation of deep neural networks as models of visual cortex and optimal attention schemes; and the use of a highly-effective TMS localization procedure to probe the causal contribution of scene- and object-selective brain regions in naturalistic object recognition.
Figure 1. How does template-based search work in real-world scenes?