Skip to main content
Vai all'homepage della Commissione europea (si apre in una nuova finestra)
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

Characterizing information integration in reinforcement learning: a neuro-computational investigation

Periodic Reporting for period 2 - INFORL (Characterizing information integration in reinforcement learning: a neuro-computational investigation)

Periodo di rendicontazione: 2023-03-01 al 2024-08-31

Reinforcement learning (RL) characterizes how we adaptively learn, by trial and errors, to select actions that maximize the occurrence of rewards, and minimize the occurrence of punishments. The behavioural, computational and neurobiological features of reinforcement-learning outcomes have been extensively studied in humans and other animals, but mostly in a standard context where the decision-maker only face one outcome (usually a reward) associated with the option they chose. As a consequence, little is known about how we prioritize, filter or value richer and more complex outcome information in RL, and how we subjectively evaluate the quality of information that supports our decision. This project proposes to address this gap, and hypothesizes that humans do learn from complex outcome information (multiple samples), but that computational limitations and affective biases curb information integration.
The prioritization, filtering and biased integration of the information carried by the outcomes of our decision may underpin critical (and undesirable) behavioral phenomena like confirmatory biases, overconfidence, and ultimately complex social phenomena like political polarization.
The objectives of our project are to investigate these cognitive processes in a well-controlled laboratory environment, to decipher the behavioral, computational and neurobiological aspects of information integration in reinforcement-learning and its biases and limitations.
To start the project, we have comprehensively re-evaluated the impact of key extension of standard reinforcement-learning setups featured in our recent work: the manipulation of the valence of choice outcomes (reward vs punishment), the manipulation of the quantity of information (simple outcome vs revealing the outcome associated with the unchosen option), and the elicitation of confidence in choices. This conceptual and empirical investigation has established and consolidated the existence of critical information integration biases and limitations: namely, that 1) information-integration is context-dependent; 2) that information is integrated in a confirmatory way; and 3) that confidence in choice accuracy is dominated by the information related to the chosen option, leading to overconfidence and valence-dependent biases in confidence (i.e. the fact that confidence is higher in gain than loss frames). We have proposed computational models that explain those three observations, and account for a large amount of empirical data. Using neuroimaging, we have dissociated two brain network, that seem to integrate two kinds of information during reinforcement-learning: a dorsal parieto-frontal network, that seems to encode relatively unbiased information, akin to learning uncertainty, and a ventro-limbic prefrontal network, that integrates the biases in information integration.
Now, we are designing (even) more complex reinforcement-learning setups where information is even richer, which we will use to investigate the cost benefits tradeoffs associated with integrating (more) information in reinforcement-learning.
Biases and limitations in human reinforcement-learning had been surprisingly neglected, especially when compared to the vast literature documenting biases and heuristics in reasoning and decision-making. Our line of work constitute a clear progress toward better documenting human behavior in reinforcement-learning, beyond its standard, normative description .
With the development of reinforcement-learning paradigms that features richer and more complex information, we expect to identify and characterize key tradeoffs between the benefits of integrating more information to guide behavior, and the computational costs engaged in integrating and treating this information.
Il mio fascicolo 0 0