Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Predicting the future: How our brain predicts future events for successful navigation through our dynamic environment.

Periodic Reporting for period 1 - PredictiveBrain (Predicting the future: How our brain predicts future events for successful navigation through our dynamic environment.)

Reporting period: 2023-01-01 to 2024-12-31

To successfully navigate our dynamic environment, our brain needs to continuously update its representation of external information. This poses a fundamental problem: how does the brain cope with a stream of dynamic input? It takes time to transmit and process information along the hierarchy of the visual system. Our capacity to interact with dynamic stimuli in a timely manner (e.g. catch a ball) suggests that our brain generates predictions of unfolding dynamics. Without such predictions, there would be a substantial time lag between states in the external world, our perception of these states, and our subsequent reactions. This notion has been supported by recent empirical evidence, e.g. for the prediction of the future position of moving dots in early visual cortex (V1). However, this approach is restricted to low-level visual information that can be represented retinotopically. Hence, while providing clear evidence for a neural prediction mechanism, it only targets the outcome of this process in early visual cortex. The temporal dynamics and representational nature of predictions, in particular at higher levels of processing, remain a largely unexplored research topic. One approach for investigating neural representations is representational similarity analysis (RSA), which typically uses models of static stimulus features at different hierarchical levels of complexity (e.g. color, shape, category, concept) to investigate how these features are represented in the brain. Before action onset I developed a novel powerful and versatile dynamic extension to RSA that allows for the first time to quantify exactly what our brain represents when in naturalistic continuous input such as movies, speech and music. In short, dRSA allows quantifying precisely how strong the match is between continuous representations in the brain and a continuous feature of naturalistic dynamic stimuli. Besides the veridicality (strength of match) of representations, it also quantifies at millisecond accuracy how representations temporally relate to (follow or precede) actual events. In other words, it captures exactly what our brain represents when in naturalistic continuous input. Veridicality and latency of neural representations are reflected in dRSA latency plots in the peak amplitude and peak latency, respectively (Fig. 1). In case of feedforward processing, one expects a lag between the model and the best-matching neural representation (i.e. time needed for information to pass from retina to V1), as reflected in a peak to the right of the vertical zero-lag midline (e.g. pixelwise luminance in Fig. 1a). Prediction should reduce this lag, potentially to it being negative (i.e. left of zero-lag midline), in which case representational content predicts the future model state. The overall objective of the action was to use this approach to characterise the representational dynamics of our brain under naturalistic continuous conditions.
In a proof-of-concept MEG study where healthy human participants observed videos of action sequences (i.e. ballet dancing), I used the new dynamic RSA approach and unveiled predictive neural representations of naturalistic dynamic input. At start of the action, I used this initial dataset to optimize the method further. Specifically, I optimized the metric used in dRSA to quantify the match between continuous stimulus models and continuous brain signals (i.e. MEG). That is, I used principal component regression (PCR), which tests variance uniquely explained by a specific model RDM, while regressing out other covarying model RDMs. This is especially useful for naturalistic continuous stimuli which inherently consist of multiple covarying features, where one feature at t = x does not only covary with another feature at t = x, but across a range of time points surrounding x, due to stimulus-inherent temporal autocorrelations. Operationalizing similarity as PCR in dRSA allows to quantify the representational veridicality and latency separately for any stimulus feature across hierarchical levels of complexity, thus making dRSA a powerful and versatile approach. This finalized version and validation of the method was then published (De Vries, I.E.J. et al. 2023 Nat Comm). Subsequently, I collected another 2 full MEG datasets, the first of which involved a direct replication of the Nat Comm study with 43 new participants and two additional viewing conditions (i.e. up-down inversion, and piecewise temporal scrambling), and the second of which involved subjects observing a naturalistic audiovisual movie of 42 minutes twice. See Results section below for details. I’m currently writing a manuscript on the former results. Additionally, towards end of the action I started data collection of a fourth dRSA MEG study, in which participants observe a naturalistic silent movie under several viewing conditions. Since data collection just started, there are no preliminary results for this study yet.
Using simulations and three experiments I have now validated the new dRSA approach. Each experiment provides valuable new insights about the representational dynamics of naturalistic input. First, I found evidence for neural predictions in MEG data of participants watching ballet videos (Fig. 1a, top; qualitatively equivalent to published results in Nat Comm). Future motion of the dancer is predicted across hierarchical timescales, with high-level view-invariant body motion predicted earliest at ~500 msec, view-dependent body motion at ~200 msec, and low-level pixelwise motion (i.e. optical flow vectors) predicted closest to real-time at ~120 msec. I also demonstrated that reducing stimulus familiarity by inversion (up-down) specifically reduces high-level (view-invariant) motion prediction (see Fig. 1a middle), while increasing mid-level (view-dependent) motion prediction. This indicates that predictions take place at the level at which we are familiar with and understand a stimulus, and that different hierarchical prediction streams are partly separable. Temporal piecewise (200-400 msec) scrambling of the videos attenuated all predictive representations (Fig. 1a, bottom), while increasing post-stimulus representation of body posture, possibly reflecting prediction errors in line with predictive coding theories. In a third, ongoing study (Fig. 1b), I present participants with a 42-minute fragment of the audiovisual film 1917 twice, a movie selected because it is edited as a single-shot with no scene or camera viewpoint changes, thus mimicking our daily experience. Since participants can make free eye movements, I only used a circle around gaze location (measured with eye tracker; Fig. 1b, middle) as model input. Initial analysis of the first 11 participants for two low-level visual models (Fig. 1b, bottom), indicates that having rich naturalistic context, results in earlier low-level representations compared to more isolated action sequences (i.e. the peak latency of pixelwise luminance and motion are 110 and -120 msec for the dance videos versus 40 and -260 msec for the movie). Narrative or contextual familiarity on the second movie viewing shifts representations even earlier in time (i.e. 20 and -300 msec, respectively). These initial results demonstrate the unique ability of dRSA to capture neural representations across hierarchical levels, from perceptual to conceptual, they demonstrate how stimulus familiarity affects perceptual predictions, and how predictions at different hierarchical levels can be manipulated independently. It also demonstrates the potential of dRSA in naturalistic settings such as free viewing an audiovisual movie only once. In other words, dRSA allows for the first time to capture the representational dynamics of the world around us.
Figure 1. Summary of main results from action.
My booklet 0 0