In order to model representation learning in a task-based setting, a mathematical framework that unifies Bayesian inference with reinforcement learning (RL) is needed. An algorithm that addresses representation learning as well as policy learning has been developed based on the Information Bottleneck framework together with a way to formalise nontrivial resource constraints. Subsequently, in order to address the myopic decision making in similar algorithms regarding the agent’s representations, the novel formalism of representational planning has been introduced. This formalism models the choice of representations over time being made by the agent explicitly, making representation learning a meta-cognitive decision-making problem. Such agents can plan ahead in the joint space of representations and behavioural actions using tree search algorithms. It has been demonstrated how the representational dynamics of such an agent - and thus its generalisation across stimuli - depend on the constraints on its computational resources, and the hyperparameters of its learning algorithm. Using this algorithm one can make specific predictions about particular decision-making situations involving a representational component.
An experimental paradigm has been developed using naturalistic images instead of simple geometric shapes, both to ensure that participants rely on the same perceptual processing than in real environments, and to increase engagement. In addition to the policy learning task, the same participants did a read-out task, in which they had to make a series of two-alternative forced choice decision between two images based on familiarity. This task, through the careful design of the feature statistics of presented image pairs, allows for the assessment of the content of the representation learned by the participant.
It has been established what parametrisations of the feature space allow for a gradual learning of the association to reward by the participants, that statistical learning of even non-trivial properties of the training set takes place during the completion of the paradigm, and that performance in the two tasks correlates in a way that support the hypothesis about the presence of the dual learning process. Stimulus sets have been developed to assess if reward properties in the learning task modulate statistical learning in accordance with a resource-rational learning procedure, providing the basis of an ongoing investigation. A second paradigm has been developed based on the tangram game. This choice directly supports testing compositional representations in a simplified, but game-like setting, and the paradigm is designed to allow for the easy production of variants, such as testing for the existence of temporally extended planning or habituation in terms of representations.
The work has been presented at the Reinforcement Learning and Decision Making (RLDM) 2022 conference at Brown University (Providence, USA), as well as in invited lectures at Princeton University (USA), Brown University and the Central European University (Budapest, Hungary). Manuscripts detailing my work during the Fellowship as well as follow-up projects are being prepared for publication.