"Our first major achievement is the ""Intelligent Trial & Error"" algorithm (IT&E). This algorithm allows a 6-legged robot to recover from many damage conditions (e.g. a broken leg, 2 missing leg, etc.) in less than 2 minutes, and without needing to perform a diagnosis. This algorithm was also successfully tested on a simple robotic arm, with similar adaption time. Overall, IT&E is several order of magnitude more data-efficient than general-purpose reinforcement learning algorithms and paves the way for robots that can adapt to unforeseen situations by trial-and-error.
Our second achievement is the ""Reset-Free Trial & Error"" algorithm(RTE), which extends the ideas introduced in IT&E but make them usable in real-life scenarios: instead of using learning episodes, which always start from the same state, RTE allows a mobile robot to ""learn while doing"" without any reset. Concretely, the robot takes the environment into account to choose control policies that are likely to help it to achieve it task, while improving its predictions about the outcome of each possible policy. This algorithm was tested on a 6-legged walking robot, which was able to learn from its mistake and reach target points in the environment in spite of a missing leg.
Both the IT&E and RTE algorithms critically rely on another new algorithm, called MAP-Elites (and its extension CVT-Map-Elites). MAP-Elites is a novel kind of evolutionary algorithm that does not attempt to find the optimum of a function, but instead searches for a diverse set of high-performing solutions (e.g. 10000 solutions that are all different but all high-performing). This algorithm opened many new research avenues for evolutionary computation and is part of a new class of algorithms called ""illumination algorithms"" or ""quality diversity algorithms"".
Our fourth achievement is the ""Black-box Data-Efficient Robot Policy Search (Black-DROPS)"" algorithm, which is a model-based reinforcement learning algorithm that is (1) highly flexible (which makes it easy to adapt to many problems/robots) and (2) highly parallelizable (which makes it possible to exploit multi-core computers). This algorithm was successfully tested on a robotic manipulator and on our 6-legged robot. Depending on the hypotheses, it can usually learn policies by trial-and-error in less than 10 episodes.
All these algorithm have been implemented in C++11 within our generic, open-source framework called Limbo (
https://github.com/resibots/limbo(se abrirá en una nueva ventana)). Limbo implements fast Gaussian processes and state-of-the-art optimization algorithms."