Skip to main content
Weiter zur Homepage der Europäischen Kommission (öffnet in neuem Fenster)
Deutsch Deutsch
CORDIS - Forschungsergebnisse der EU
CORDIS

Higher-order motor control of stochastic behavior in an uncertain environment

Periodic Reporting for period 1 - MOTORHEAD (Higher-order motor control of stochastic behavior in an uncertain environment)

Berichtszeitraum: 2022-10-01 bis 2025-03-31

Decision-making behavior requires the selection and adaptation of goals and actions that maximize reward.It occurs in the absence of clear instruction to guide action, is rather driven by the subjective value of each action built on past experience.To support such behavior, theories predict that the brain must compute a deterministic “decision-value”, that is, the difference between each action value.It implies that the action with the highest value should always be chosen, behavior is often stochastic with variability from trial-to-trial.ERC aims to achieve an unprecedented level of understanding of these “unpredictable” behaviors.Higher-order motor areas, which include the secondary motor cortex(M2) in rodents, appear to be at the top of a hierarchy when it comes to optimally orchestrating the selection of action, and local computation in M2 possibly links decision-value to future motor action.The decision value may be deterministic, it only describes behavioral statistics and not individual choices in single trials.ERC aims to characterize the integrative role of the M2 during decision-making by investigating its local computation and dynamic sub-cortical inputs and outputs while mice learn by trial and error.Objectives:1)To probe the hierarchical neural operations in M2 that funnel decision-value signal to output-specific L5 pyramidal tract.2)To probe the bottom-up modulation of decision-value and motor command in M2 by subcortical structures projecting to the M2.We propose a system neuroscience approach,to illuminate the cellular and synaptic principles underlying the control and transformation of decision variable in local and long-range motor-related circuits upon changing behavioral strategies.
We designed a deterministic, two-options reversal learning task, in which head-restrained mice learn a reinforced self-initiated lever-pressing preference, without any external cue indicating the rewarded action. Our task requires mice to actively engage both forepaws in making choices, with each decision involving a specific action — either a press with the right or left forepaw. Unlike recent studies in M2 where animals lick or press a single lever to make a choice, mice in our task could select L1, L2, or both simultaneously (L1&L2). Notably, the frequency of pressing both levers simultaneously increases when the mice explored both levers equally. Yet, a key aspect of our task is that pressing L1&L2 never results in a reward and, according to basic RL principles, should not be reinforced. To account for this, we developed a computational "race" model between actions, where the delay in executing each individual action is linked to its value. Our model captured accurately the animal’s lever-pressing strategy and reward rate, and provided access to lever-pressing values Q according to Rescorla-Wagner-type equations. The prediction of the observed delays without fitting them, as well as the observation that the delay scales inversely with the sum of lever-pressing values, support the robustness of our model, extending its validity beyond a simple regression capability. Using two-photon calcium imaging of M2 neuronal population activity combined with behavioral modeling and optogenetics, we showed that the M2 encodes information about decision-values through persistent population activity, which could be used as a signal to dictate the probability of taking each action. By recording the same neurons throughout the learning process — from naïve to expert stages — we observed that persistent coding evolves gradually from trial to trial, reflecting how the decision-value is updated after each action-outcome pair. This, in turn, determines the rate at which learning occurs and is reversed when the reward contingency changes unexpectedly. These results highlight the use of decision-values by M2 to adapt choice during initial learning without instructive cues.

Whether decision-value is converted into a binary motor command remains an open and critical question. It is possible that this arises through long-range loops between multiple brain regions such as the thalamus, midbrain, cerebellum, and basal ganglia, similar to memory-guided licking tasks in mice. Here we explored whether the the basolateral amygdala (BLA), traditionally recognized for its role in associative fear learning, also contributes to self-initiated, incentive-motivated behaviors. However, the rules of how BLA contributes to learning to initiate initially neutral actions for a positive outcome are unclear. In particular, although the mouse secondary motor cortex (M2), a key region involved in spontaneous action initiation, is a major target of BLA glutamatergic outputs, it is unknown whether and how the BLA-to-M2 communication participates in the self-initiation of incentive-motivated actions. To address these questions, we trained head-fixed mice to press two initially neutral levers to obtain a water reward. Suppression of the BLA-to-M2 synaptic signals by tetanus toxin expression revealed that they are key to rapid behavioral learning. Consistently, we characterized how this synaptic communication scales with learning speed using two-photon microscopy of BLA axonal boutons in M2 during the task. Imaging experiments also revealed functional assemblies of boutons activated at distinct steps of the behavior, suggesting well-separated roles: 1) controlling press initiation, 2) discriminative reporting of lever pressing, and 3) reporting licking. Longitudinal imaging of the same axons revealed that single bouton activity was stable for more than two weeks. Finally, when we devalued the preferred lever, animals learnt to reverse their lever preference and the level of preparatory activity for press scaled with the preference of the chosen lever, which suggests that BLA-to-M2 communication participates in value-based action selection on top of the initial incentive-motivated behavioral learning.
1.We designed a set of mathematical and computational techniques specialized for the analysis of time-lapse microscopy images of neurite activity in small behaving animals. We provide these techniques as a free and open-source library to support the efforts of the community in advancing in vivo microscopy of neurite activity.
2.Our evidence suggests that decision-coding cells in M2 gradually develop their specificity over the course of trials, as naïve mice learn and improve their performance. We also found that the mice that learned the fastest exhibited the highest levels of M2 activity along the decision-axis, highlighting a crucial relationship between decision-value and the adaptation of choice behavior, which represents a novel finding.
3.The mouse secondary motor cortex (M2), a critical region for spontaneous action initiation, is a major target of basolateral amygdala (BLA) glutamatergic outputs. However, the role of BLA-to-M2 communication in self-initiating incentive-motivated actions has remained unclear. Our research sheds light on this interaction, providing key insights into how the BLA supports learning to initiate previously neutral actions to achieve positive outcomes.
4.Our findings reveal that respiration-defined packets serve as fundamental, fine-grained temporal units for brain activity during NREM sleep. This discovery holds the potential to significantly advance our understanding of brain function and could represent a major breakthrough for patient care during surgery.
Mein Booklet 0 0