Developmental trajectories for model-free and model-based reinforcement learning: computational and neural bases

Informations projet

DEVELOP‐LEARNING

N° de convention de subvention: 328822

Projet clôturé

Date de début 1 Janvier 2014

Date de fin 31 Decembre 2015

Financé au titre de

Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)

Coût total

€ 231 283,20

Contribution de l’UE

€ 231 283,20

231 283,20

Coordonné par

UNIVERSITY COLLEGE LONDON

Final Report Summary - DEVELOP-LEARNING (Developmental trajectories for model-free and model-based reinforcement learning: computational and neural bases)

2014
In 2014 we carried out the first behavioral experiment. The experiment consisted in administrating to a cohort of adult and adolescents a novel instrumental learning task involving basic reward and punishment learning, as well as learning from counterfactual (fictive) information. Two key findings emerge from our data: adolescents’ performance was not enhanced by counterfactual information; and, compared to adults, adolescents learned preferentially from reward compared to punishment, whereas adults learned equally from both. To formalize these findings, we employed computation models to analyse these behavioural data. Our computational analyses revealed that adults and adolescents do not implement the same algorithm to solve the learning task. The learning results were explained by the incomplete development in adolescence of the computational modules necessary to simultaneously process alternative outcomes (counterfactuals) and to compute negative outcomes relative to their context (relative value). To our knowledge, ours is the first study that has characterized this cognitive development in terms of computational maturation, using Bayesian model comparison techniques, as well as model simulations.

2015
Study 1 investigated valence-induced reinforcement learning biases in stable, as well as variable, environments. This study involved two experiments. The first experiment (N=20 adult participants) investigated biases in learning from obtained results (i.e. factual learning). The second experiment (N=20 adult participants) investigated biases in learning from forgone results (i.e. counterfactual learning). The results show that both factual and counterfactual learning are biased in a valence-dependent manner. The corresponding article is under preparation for submission to a scientific journal.

Study 2 investigated how individual reinforcement learning strategies (as defined by computational modeling) can be influenced by concomitant social tasks. We explored the process of interest, that we define “human computational mimesis” in two ways. In a first experiment (the “prediction game”; N=20) adult participants had to learn to predict the behavior of another (virtual) subject that performed a reinforcement learning task, using a different computational strategy. In a second experiment (the “advice game”; N=20) adult participants, while performing an instrumental task, received advice from another (virtual) subject using a different computational strategy. The data are currently under analysis to test the hypothesis that social tasks can modify the computational strategy originally implemented by a subject (“mimesis”).

Study 3 investigated observational, as opposite to reinforcement, learning. Observational learning is learning from observing others’ behavior (it is similar to imitation). To investigate observational learning and compare it to reinforcement learning, we design a behavioral task that we administrated to two cohorts of subjects (N=51 adult participants in “discovery” and 51 different adult participants in the “replication” sample) via the online testing facility Mechanical Turk. We will design a new computational model of observational learning and correlate different computational measures (such as learning rates) with psychometric measures. The data are currently under analysis to test the hypothesis that observation learning is affect in subjects with high anxiety.

Submitted or in Revision
1. Palminteri S, Kilford EJ, Coricelli G, Blakemore SJ. The computational development of reinforcement learning during adolescence. Plos Computational Biology (in minor revision).

Study description: We employed a novel learning task to investigate how adolescents and adults learn from reward versus punishment, and to counterfactual feedback about decisions. Computational analyses revealed that adults and adolescents did not implement the same algorithm to solve the learning task. In contrast to adults, adolescents’ performance did not take into account counterfactual information; adolescents also learned preferentially to seek rewards rather than to avoid punishments, whereas adults learned to seek and avoid both equally. Increasing our understanding of computational changes in reinforcement learning during adolescence may provide insights into adolescent value-based decision-making. Our results might also have implications for education, since they suggest that adolescents benefit more from positive feedback than from negative feedback in learning tasks.

2. Lefebvre G, Lebreton M, Meyniel F, Bourgeois-Gironde S, Palminteri S. Asymmetric reinforcement learning: computational and neural bases of positive life orientation. (submitted).
3. Lebreton M, Palminteri S. Assessing inter-individual variability in brain-behavior relationship with functional neuroimaging. (submitted).
4. Giavazzi M, Daland R, Palminteri S, Peperkamp S, Debernard L, Brugieres P, Jacquemot C, Schramm C, Cleret de Langavant L, Bachoud-Lévi AC. The causal role of the striatum in linguistic selection: evidence from Huntington’s disease and computational modeling. (submitted).

Already published
5. Delorme C, Salvador A, Valabrègue R, Roze E, Palminteri S, Vidailhet M de Wit S, Robbins T, Hartmann A, Worbe Y. Enhanced habits formation in Gilles de la Tourette syndrome. Brain (2016).
6. Palminteri S, Khamassi M, Joffily M, Coricelli G. Contextual modulation of value signals in reward and punishment learning. Nature Communications (2015).
7. Vinckier F, Gaillard R, Palminteri S, Salvador A, Rigoux L, Fornito A, Adapa R, Krebs MO, Pessiglione M, Fletcher PC. Confidence and Psychosis: a neuro-computational account of contingency learning disruption by NMDA blockade. Molecular Psychiatry (2015).
8. Worbe Y, Palminteri S, Savulich G, Daw ND, Fernandez-Egea E, Robbins TW, Voon V. Valence-dependent effects of serotonin depletion on model-based and model-free choice strategies. Molecular Psychiatry (2015).

Final Report Summary - DEVELOP-LEARNING (Developmental trajectories for model-free and model-based reinforcement learning: computational and neural bases)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page