Goal-directed learning of the statistical structure of the environment

Project Information

RELEARN

Grant agreement ID: 897042

DOI

10.3030/897042

Project closed

EC signature date 30 November 2020

Start date 1 January 2021

End date 31 December 2022

Funded under

EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions

Total cost

€ 174 806,40

EU contribution

€ 174 806,40

174 806,40

Coordinated by

MAX-PLANCK-GESELLSCHAFT ZUR FORDERUNG DER WISSENSCHAFTEN EV
Germany

Periodic Reporting for period 1 - RELEARN (Goal-directed learning of the statistical structure of the environment)

Reporting period: 2021-01-01 to 2022-12-31

How humans learn representations of their environment is a question to which cognitive science and neuroscience offer only very fragmentary answers currently. Existing theoretical frameworks don’t incorporate the element of information-directed exploration into the representation learning process, and experimental paradigms don’t aim to directly assess representational dynamics through a behavioural lens. Advancements on both of these fronts are required to produce a useful model of human task-based representation learning, that is able to make predictions about behaviour observable on the short time scales of practical experiments in such a way that allows for the empirical evaluation of algorithmic hypotheses. We need to reconcile theoretical approaches addressing separate aspects of this problem, such as Bayesian inference and model-based reinforcement learning, as well as extend them with an explicit consideration of meta-cognitive decision making about perceptual compression. Existing experimental paradigms have to be extended such that they present naturalistic decision-making problems to participants and allow for the evaluation of learned representations both behaviourally and neurally using imaging methods.
The action produced both experimental advances in the form of two novel experimental paradigms, and theoretical ones, including a more complete understanding of the relationship between the existing frameworks of Control as Inference and the Information Bottleneck, and most importantly, it opened the entirely new research direction of representational planning, with implications in both machine learning and cognitive science.

In order to model representation learning in a task-based setting, a mathematical framework that unifies Bayesian inference with reinforcement learning (RL) is needed. An algorithm that addresses representation learning as well as policy learning has been developed based on the Information Bottleneck framework together with a way to formalise nontrivial resource constraints. Subsequently, in order to address the myopic decision making in similar algorithms regarding the agent’s representations, the novel formalism of representational planning has been introduced. This formalism models the choice of representations over time being made by the agent explicitly, making representation learning a meta-cognitive decision-making problem. Such agents can plan ahead in the joint space of representations and behavioural actions using tree search algorithms. It has been demonstrated how the representational dynamics of such an agent - and thus its generalisation across stimuli - depend on the constraints on its computational resources, and the hyperparameters of its learning algorithm. Using this algorithm one can make specific predictions about particular decision-making situations involving a representational component.

An experimental paradigm has been developed using naturalistic images instead of simple geometric shapes, both to ensure that participants rely on the same perceptual processing than in real environments, and to increase engagement. In addition to the policy learning task, the same participants did a read-out task, in which they had to make a series of two-alternative forced choice decision between two images based on familiarity. This task, through the careful design of the feature statistics of presented image pairs, allows for the assessment of the content of the representation learned by the participant.

It has been established what parametrisations of the feature space allow for a gradual learning of the association to reward by the participants, that statistical learning of even non-trivial properties of the training set takes place during the completion of the paradigm, and that performance in the two tasks correlates in a way that support the hypothesis about the presence of the dual learning process. Stimulus sets have been developed to assess if reward properties in the learning task modulate statistical learning in accordance with a resource-rational learning procedure, providing the basis of an ongoing investigation. A second paradigm has been developed based on the tangram game. This choice directly supports testing compositional representations in a simplified, but game-like setting, and the paradigm is designed to allow for the easy production of variants, such as testing for the existence of temporally extended planning or habituation in terms of representations.

The work has been presented at the Reinforcement Learning and Decision Making (RLDM) 2022 conference at Brown University (Providence, USA), as well as in invited lectures at Princeton University (USA), Brown University and the Central European University (Budapest, Hungary). Manuscripts detailing my work during the Fellowship as well as follow-up projects are being prepared for publication.

The largest theoretical impact of the project is the establishment of the new research direction of representational planning. This theoretical formalism allows for the formulation of entirely novel hypotheses about human representational dynamics, and the behavioural choices it implies, and has several direct applications in developing algorithms for artificial learning agents in heavily resource-restricted environments as well as for modelling human or animal learning. The experimental efforts yielded a software setup to produce naturalistic stimuli for decision making experiments, as well as a two-part paradigm in which effects of learning are directly testable. The analysis of the recorded data provided a wealth of practical considerations about stimuli using rendered graphics and two-part paradigms with a read-out component. The tangram-paradigm, inspired by representational planning, continues to serve as the basis of ongoing research projects.

Overview Image

Periodic Reporting for period 1 - RELEARN (Goal-directed learning of the statistical structure of the environment)

Download Download the content of the page