European Commission logo
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

Robots with animal-like resilience

Periodic Reporting for period 4 - ResiBots (Robots with animal-like resilience)

Okres sprawozdawczy: 2019-11-01 do 2020-10-31

Despite over 50 years of research in robotics, most existing robots are far from being as resilient as the simplest animals: they are fragile machines that easily stop functioning in difficult conditions. This fragility prevents us to deploy robots in many situations (e.g. natural disaster or long-term observation missions).

The objective of the ResiBots project is to introduce a powerful and general approach for failure recovery based novel trial-and-error learning algorithms (reinforcement learning). The main challenge is to design learning algorithms that can learn in a few minutes (a dozen of trials), instead of days and thousands of trials for traditional algorithms. Our main insight is that we can leverage the model of the intact robot as a prior for a data-driven model of either the dynamics or the expected cumulative reward; these models can be exploited to make reinforcement learning much more data-efficient.
"Our first major achievement is the ""Intelligent Trial & Error"" algorithm (IT&E). This algorithm allows a 6-legged robot to recover from many damage conditions (e.g. a broken leg, 2 missing leg, etc.) in less than 2 minutes, and without needing to perform a diagnosis. This algorithm was also successfully tested on a simple robotic arm, with similar adaption time. Overall, IT&E is several order of magnitude more data-efficient than general-purpose reinforcement learning algorithms and paves the way for robots that can adapt to unforeseen situations by trial-and-error.

Our second achievement is the ""Reset-Free Trial & Error"" algorithm(RTE), which extends the ideas introduced in IT&E but make them usable in real-life scenarios: instead of using learning episodes, which always start from the same state, RTE allows a mobile robot to ""learn while doing"" without any reset. Concretely, the robot takes the environment into account to choose control policies that are likely to help it to achieve it task, while improving its predictions about the outcome of each possible policy. This algorithm was tested on a 6-legged walking robot, which was able to learn from its mistake and reach target points in the environment in spite of a missing leg.

Both the IT&E and RTE algorithms critically rely on another new algorithm, called MAP-Elites (and its extension CVT-Map-Elites). MAP-Elites is a novel kind of evolutionary algorithm that does not attempt to find the optimum of a function, but instead searches for a diverse set of high-performing solutions (e.g. 10000 solutions that are all different but all high-performing). This algorithm opened many new research avenues for evolutionary computation and is part of a new class of algorithms called ""illumination algorithms"" or ""quality diversity algorithms"".

Our fourth achievement is the ""Black-box Data-Efficient Robot Policy Search (Black-DROPS)"" algorithm, which is a model-based reinforcement learning algorithm that is (1) highly flexible (which makes it easy to adapt to many problems/robots) and (2) highly parallelizable (which makes it possible to exploit multi-core computers). This algorithm was successfully tested on a robotic manipulator and on our 6-legged robot. Depending on the hypotheses, it can usually learn policies by trial-and-error in less than 10 episodes.

All these algorithm have been implemented in C++11 within our generic, open-source framework called Limbo (https://github.com/resibots/limbo). Limbo implements fast Gaussian processes and state-of-the-art optimization algorithms."
"The current approach to failure recovery is mostly inherited from engineering of safety-critical systems (e.g. nuclear plants or spaceships). In these contexts, reliability is typically achieved by (1) extensive testing before deployment to ensure the best robustness, and (2) self-diagnosis procedures followed by a search for a contingency plan. Such approaches are expensive because they involve a long engineering process and an abundance of internal sensors. More importantly, they require anticipating all the potential causes of failure to design the robot so that it is robust to them, but also to put internal sensors ""at the right place'"" to perform an accurate diagnosis. For instance, most robots are unable to see their back with their cameras, which prevents them to diagnose any problem there (e.g. a tree branch that change their center of mass).

The ResiBots project goes beyond the state-of-the-art by proposing a new, general approach for damage recovery that does not require any diagnosis of the failure. To do so, the ResiBots project introduced several trial-and-error algorithms that allow damaged robots to discover compensatory behaviors in less than 2 minutes (a dozen of trials only), compared to hours or often days with traditional reinforcement learning algorithms. All of them leverages models based on Gaussian processes and a simulator of the intact robot as a prior.

Overall, the ResiBots project leverages data-efficient reinforcement learning to make robots more resilient."
Illustration of a damaged 6-legged walking robot