Using simulations and three experiments I have now validated the new dRSA approach. Each experiment provides valuable new insights about the representational dynamics of naturalistic input. First, I found evidence for neural predictions in MEG data of participants watching ballet videos (Fig. 1a, top; qualitatively equivalent to published results in Nat Comm). Future motion of the dancer is predicted across hierarchical timescales, with high-level view-invariant body motion predicted earliest at ~500 msec, view-dependent body motion at ~200 msec, and low-level pixelwise motion (i.e. optical flow vectors) predicted closest to real-time at ~120 msec. I also demonstrated that reducing stimulus familiarity by inversion (up-down) specifically reduces high-level (view-invariant) motion prediction (see Fig. 1a middle), while increasing mid-level (view-dependent) motion prediction. This indicates that predictions take place at the level at which we are familiar with and understand a stimulus, and that different hierarchical prediction streams are partly separable. Temporal piecewise (200-400 msec) scrambling of the videos attenuated all predictive representations (Fig. 1a, bottom), while increasing post-stimulus representation of body posture, possibly reflecting prediction errors in line with predictive coding theories. In a third, ongoing study (Fig. 1b), I present participants with a 42-minute fragment of the audiovisual film 1917 twice, a movie selected because it is edited as a single-shot with no scene or camera viewpoint changes, thus mimicking our daily experience. Since participants can make free eye movements, I only used a circle around gaze location (measured with eye tracker; Fig. 1b, middle) as model input. Initial analysis of the first 11 participants for two low-level visual models (Fig. 1b, bottom), indicates that having rich naturalistic context, results in earlier low-level representations compared to more isolated action sequences (i.e. the peak latency of pixelwise luminance and motion are 110 and -120 msec for the dance videos versus 40 and -260 msec for the movie). Narrative or contextual familiarity on the second movie viewing shifts representations even earlier in time (i.e. 20 and -300 msec, respectively). These initial results demonstrate the unique ability of dRSA to capture neural representations across hierarchical levels, from perceptual to conceptual, they demonstrate how stimulus familiarity affects perceptual predictions, and how predictions at different hierarchical levels can be manipulated independently. It also demonstrates the potential of dRSA in naturalistic settings such as free viewing an audiovisual movie only once. In other words, dRSA allows for the first time to capture the representational dynamics of the world around us.