Classically dopamine neurons that encode a reward prediction error (RPE) are thought to provide the critical teaching signal that drives reinforcement learning. Surprisingly we discovered that stable learning is driven by a novel movement-based dopaminergic teaching signal, we named action prediction error (APE). This error signal reflects the difference between the action that is taken and the extent an action was predicted in certain situation. In other words, it reflects how surprising it is that an action is taken in a particular context. We’ve shown that this prediction error serves as a value-free teaching signal that works in concert with the canonical RPE to support learning. These two teaching signals give rise to a dual-learning model where RPEs are used to update the value of action-outcome associations and drive initial learning. In parallel APEs update how often a stimulus-response association has been performed. Taken together we’ve discovered a new type of dopaminergic teaching signal and have provided the first evidence that the basal ganglia are structured to implement a dual value-based/value-free learning system. These results have now been published (Greenstreet et al., Nature, 2025).
In a separate aspect of the project we investigated the role sleep plays in forming prediction and driving learning. Sleep is critical for consolidating all forms of memory, from remembering episodic experience to the development of motor skills. A core feature of this consolidation process is the offline replay of neuronal firing patterns that occur during awake experience. This replay is thought to originate in the hippocampus and trigger the reactivation of interconnected ensembles of cortical and subcortical neurons. However, non-declarative memories do not require the hippocampus for learning or for sleep-dependent consolidation meaning what drives their sleep-dependent consolidation is unknown. Here we show that replay occurs during offline consolidation of a non-declarative, procedural, memory and that this replay of procedural experience is generated independently of the hippocampus. We found, using an unsupervised method, that neural sequences are replayed in the striatum in a compositional manner with each type of neural sequences replayed individually or in combination. The replay occurred at both real-world and time-compressed speeds and was also prioritised both at the level of the individual neurons and the type of neural sequence. Complete bilateral lesions of the hippocampus had no effect on any feature of this replay. Our results demonstrate that procedural replay during the consolidation of a non-declarative memory is independent of the hippocampus. These results support the view that replay drives active consolidation of all types of memory during sleep but challenges the idea that the hippocampus is the source of this replay. These results will prompt investigation into alternative mechanisms for replay generation and new theories for offline memory consolidation. These results of this study are available as a preprint (Thompson et al., bioRxiv, 2024).