Skip to main content
Weiter zur Homepage der Europäischen Kommission (öffnet in neuem Fenster)
Deutsch Deutsch
CORDIS - Forschungsergebnisse der EU
CORDIS

Elucidating the Basal Ganglia Circuits for Reward Expectation

Periodic Reporting for period 4 - ExpectBG (Elucidating the Basal Ganglia Circuits for Reward Expectation)

Berichtszeitraum: 2024-07-01 bis 2024-12-31

Predicting future outcomes is fundamental for adaptive behaviour. If we can predict the outcome of our actions we can choose the best course of action, if we can’t then then the outcome will be a surprise that we’ll use to learn for the future. Reward predictions for example are crucial for learning since they can be compared to actual outcomes to determine if an outcome was better or worse than expected. Even though reward expectation signals are observed in many areas of the brain how they are computed remains unknown. The main reason for lack of progress is the absence of a clear understanding of where expectation is generated, and which circuits are involved in its computation. Consequently, we are missing the prerequisite knowledge for determining where reward expectation arises, how it is computed, and how expectations are learnt. Equally important for understanding learning is determining what type of prediction errors the brain uses for learning. The aim of this work is to determine what type of prediction errors the brain computes and determine where expectations (predictions) are formed in the brain. Knowing how the brain learns to make predictions is critical as these are processes that are disrupted in many psychiatric diseases such as depression and schizophrenia.
Classically dopamine neurons that encode a reward prediction error (RPE) are thought to provide the critical teaching signal that drives reinforcement learning. Surprisingly we discovered that stable learning is driven by a novel movement-based dopaminergic teaching signal, we named action prediction error (APE). This error signal reflects the difference between the action that is taken and the extent an action was predicted in certain situation. In other words, it reflects how surprising it is that an action is taken in a particular context. We’ve shown that this prediction error serves as a value-free teaching signal that works in concert with the canonical RPE to support learning. These two teaching signals give rise to a dual-learning model where RPEs are used to update the value of action-outcome associations and drive initial learning. In parallel APEs update how often a stimulus-response association has been performed. Taken together we’ve discovered a new type of dopaminergic teaching signal and have provided the first evidence that the basal ganglia are structured to implement a dual value-based/value-free learning system. These results have now been published (Greenstreet et al., Nature, 2025).

In a separate aspect of the project we investigated the role sleep plays in forming prediction and driving learning. Sleep is critical for consolidating all forms of memory, from remembering episodic experience to the development of motor skills. A core feature of this consolidation process is the offline replay of neuronal firing patterns that occur during awake experience. This replay is thought to originate in the hippocampus and trigger the reactivation of interconnected ensembles of cortical and subcortical neurons. However, non-declarative memories do not require the hippocampus for learning or for sleep-dependent consolidation meaning what drives their sleep-dependent consolidation is unknown. Here we show that replay occurs during offline consolidation of a non-declarative, procedural, memory and that this replay of procedural experience is generated independently of the hippocampus. We found, using an unsupervised method, that neural sequences are replayed in the striatum in a compositional manner with each type of neural sequences replayed individually or in combination. The replay occurred at both real-world and time-compressed speeds and was also prioritised both at the level of the individual neurons and the type of neural sequence. Complete bilateral lesions of the hippocampus had no effect on any feature of this replay. Our results demonstrate that procedural replay during the consolidation of a non-declarative memory is independent of the hippocampus. These results support the view that replay drives active consolidation of all types of memory during sleep but challenges the idea that the hippocampus is the source of this replay. These results will prompt investigation into alternative mechanisms for replay generation and new theories for offline memory consolidation. These results of this study are available as a preprint (Thompson et al., bioRxiv, 2024).
Overall we have made two discoverers in the project. The first that the brain has two parallel mechanisms for learning and decision making; A value-based and a frequency-based learning system. The value-based learning system uses reward prediction error signals to update the value of actions, this allows animals to choose the most valuable alternative when making a decision. The frequency-based learning system uses action prediction error signals to update how often an action has been performed in a certain context, this allows an animal to choose the most common alternative when making a decision.

Our second discovery is that there are multiple parallel mechanisms that allow different memory systems to replay the neural activity related to awake experience. This offline replay during sleep is critical for forming prediction and for consolidating memory. Specifically, we showed that procedural replay occurs independently of the hippocampus, revealing that replay-driven memory consolidation operates through parallel, independent mechanisms tailored to distinct memory systems.
Dual-dopaminergic prediction errors support different learning strategies
Mein Booklet 0 0