Periodic Reporting for period 1 - BAYES-RL (Deep Bayesian Reinforcement Learning -- Unifying Perception, Planning, and Control)
Reporting period: 2022-04-01 to 2024-09-30
Towards this end, we shall develop neural network representations of uncertainty, and algorithms that estimate uncertainty from data. We will develop theory and algorithms for decision making under uncertainty, bringing in a fresh perspective to the problem based on Bayesian reinforcement learning (Bayesian RL, also termed Meta RL). These advances will allow us to study safety certificates for deep RL, and develop a general and practical methodology for learning-based robotic manipulation under uncertainty, validated via real robot experiments.
The work is organized according to 4 main objectives:
1. Develop Models and Algorithms for Learning Neural Belief Representations - this objective centers on developing representation learning methodologies to be used in Bayesian RL.
2. Scale Up the Framework of Bayesian RL Using Deep Learning - this objective centers on developing (Bayesian) deep RL algorithms.
3. Explore Safety Certificates for Deep RL - this objective centers on safety and interpretability aspects of Bayesian RL.
4. Develop a Practical Deep Learning Framework for Robotic Manipulation - this objective will exploit the methodologies developed in the other three objectives to investigate applications in robotic manipulation.
a. We developed contrastive learning approach for learning neural belief representations, establishing the theory and practice of this approach. [Choshen and Tamar, ContraBAR: Contrastive Bayes-adaptive deep RL. ICML 2023]
b. We developed the deep latent particle (DLP) structured image representation, a breakthrough that uses image keypoints as latent variables in an image representation. This allows to represent the intricate interactions between several objects in images and video, and we have used it to obtain state-of-the-art results in object-centric image generation, video prediction, and reinforcement learning. [Daniel and Tamar. Unsupervised image representation learning with deep latent particles. ICML 2022; Daniel and Tamar, DDLP: Unsupervised object-centric video prediction with deep dynamic latent particles. TMLR, 2023; Haramati et al., Entity-centric reinforcement learning for object manipulation from pixels. ICLR, 2024]
Objective 2: Scale Up the Framework of Bayesian RL Using Deep Learning.
a. We initiated the theoretical study of Meta-RL, focusing on the question of how much training domains are required to guarantee near Bayes-optimal learning. This investigation allowed us to make progress in a principled approach to offline meta RL, and furthermore, opened a new theoretical research direction that we did not anticipate [Rimon et al., Meta reinforcement learning with finite training tasks – a density estimation approach. NeurIPS 2022; Mutti and Tamar, test-time regret minimization in meta reinforcement learning. ICML 2024]
b. We developed a method that uses efficient GPU-based physical simulation to perform inference for robotic manipulation tasks, establishing the validity of using physics simulation to speed up Bayesian inference in robotics [Krupnik et al., Fine-tuning generative models as an inference method for robotic tasks. CoRL 2023].
c. Scaling up of deep Bayesian RL: we scaled up meta RL to domains with image inputs, showing that by exploring at test time we are able to improve the generalization to new domains in RL [Zisselman et al., Explore to generalize in zero-shot RL. NeurIPS 2023]. We also scaled up meta RL to higher dimensional task distributions, based on a novel model-based meta RL approach [Rimon et al., Mamba: an effective world model approach for meta-reinforcement learning. ICLR 2024]
Objective 3: Explore Safety Certificates for Deep RL
Following the research plan, work on this objective will start during year 3 of the project.
Objective 4: Develop a Practical Deep Learning Framework for Robotic Manipulation
a. We started investigating the problem of using simulation to speed up Bayesian inference in robotics [Krupnik et al., Fine-tuning generative models as an inference method for robotic tasks. CoRL 2023]
b. We have begun investigating a unified framework for robotic manipulation. We have made progress in deformable object manipulation (ropes) [Sudry et al., Hierarchical planning for rope manipulation using knot theory and a learned inverse model. CoRL 2023] and multi-object manipulation [Haramati et al., Entity-centric reinforcement learning for object manipulation from pixels. ICLR, 2024].
So far, the achievements above yielded 15 publications in top-tier machine learning conferences and journals.
1. Establishing the foundations of meta RL. So far, meta RL has mostly been studied empirically. We contributed the first theoretical investigation of meta RL, answering important questions such as “can we expect learning a near-optimal exploration”, and “how many training tasks are required to learn near-optimal exploration”. By framing the theoretical question, our work opens the door to additional research into the theory of meta RL.
2. Object centric representation using deep latent particles (DLP). The DLP representation is a novel image representation method that combines classical ideas from computer vision with modern deep generative models. The key idea is that the latent variables in a deep generative model can be ‘particles’ in the image, with specific locations and other properties such as size, opacity, and appearance features. This idea allows to effectively represent scenes with multiple interacting objects, as common in everyday robotics. So far, we have shown that DLPs can be used to improve the state of the art in various tasks, including image representation, video prediction, and reinforcement learning.
3. Generalization in RL by exploration. Generalization in RL is known to be a difficult problem, and I personally have tried to solve this problem for several years. We recently had a breakthrough – we found that by training agents to explore their environment (using Maximum entropy RL), we obtain behavior that generalizes much better. Intuitively, this is because the exploration behavior is harder to ‘memorize’. We exploited this discovery to develop a new algorithm that explores whenever it is uncertain. Our algorithm, ExpGen, significantly improved upon the state of the art in a popular benchmark for generalization in RL (ProcGen benchmark).
To ensure further success, we will need to devote more resources to scaling up our investigation. Recently, it has been observed that many robotic manipulation problems may be solved simply by “scale”, that is, using straightforward algorithms such as imitation learning, but collecting data using teleoperation from many human demonstrators. This approach has been mostly taken by industrial labs, and has shown promising generalization abilities. A mitigation plan here is to use publicly available data, such as the Open X-Embodiment project, and adapt the Bayesian RL methodology to be able to exploit such data.