Periodic Reporting for period 1 - RELEARN (Goal-directed learning of the statistical structure of the environment)
Reporting period: 2021-01-01 to 2022-12-31
The action produced both experimental advances in the form of two novel experimental paradigms, and theoretical ones, including a more complete understanding of the relationship between the existing frameworks of Control as Inference and the Information Bottleneck, and most importantly, it opened the entirely new research direction of representational planning, with implications in both machine learning and cognitive science.
An experimental paradigm has been developed using naturalistic images instead of simple geometric shapes, both to ensure that participants rely on the same perceptual processing than in real environments, and to increase engagement. In addition to the policy learning task, the same participants did a read-out task, in which they had to make a series of two-alternative forced choice decision between two images based on familiarity. This task, through the careful design of the feature statistics of presented image pairs, allows for the assessment of the content of the representation learned by the participant.
It has been established what parametrisations of the feature space allow for a gradual learning of the association to reward by the participants, that statistical learning of even non-trivial properties of the training set takes place during the completion of the paradigm, and that performance in the two tasks correlates in a way that support the hypothesis about the presence of the dual learning process. Stimulus sets have been developed to assess if reward properties in the learning task modulate statistical learning in accordance with a resource-rational learning procedure, providing the basis of an ongoing investigation. A second paradigm has been developed based on the tangram game. This choice directly supports testing compositional representations in a simplified, but game-like setting, and the paradigm is designed to allow for the easy production of variants, such as testing for the existence of temporally extended planning or habituation in terms of representations.
The work has been presented at the Reinforcement Learning and Decision Making (RLDM) 2022 conference at Brown University (Providence, USA), as well as in invited lectures at Princeton University (USA), Brown University and the Central European University (Budapest, Hungary). Manuscripts detailing my work during the Fellowship as well as follow-up projects are being prepared for publication.