Learning Mobility for Real Legged Robots

LEMO investigates the use of machine learning technologies to increase mobility of legged robotic systems, enabling such systems to achieve unprecedented locomotion and navigation skills in challenging environments. The goal is to leverage simulation tools paired with models from the real world to generate the data necessary for training locomotion policies that can be transferred and deployed on real systems. The developed methods have the potential to create a paradigm change from classical model- and optimization-based control to systems that are controlled by neural networks. Besides focusing on pure local control, this project aims to blend with perception and planning to enable far-sighted and environment-aware autonomous legged navigation. While the technology is mostly tested and demonstrated on a series of quadrupedal robots and in the context of locomotion, the approach and developed tools are transferable to other systems and problems.

This new learning-based perceptive locomotion has the potential to outperform existing approaches massively. In particular the robustness when dealing with uncertain and unstructured environments can significantly improve. Moreover, the systems can be pushed much closer to their performance limits (e.g. actuation torque and speed limits), since the proposed methods can better handle the nonlinearities and constraints.

Overall, the work is expected to be essential for a new generation of mobile robotic systems that will find application in various fields such as industrial inspection and maintenance, search and rescue, or planetary exploration. Thanks to this technology, legged robots, which still suffered from falls and failures when starting this project, are expected to be able to traverse extremely challenging terrain with unprecedented mobility and reliability by the end of this project.

The proposed tools are expected to be holistic and generalizable, which means that new machines can be controllable in a short time and with comparable low development effort. This will lower the entrance barrier and development time for researchers and engineers and offer new opportunities for commercial solutions.

In the first 2.5 years, our team made substantial progress and demonstrated the superiority of the proposed approaches in scientific publications and real world demonstrations. In two seminal science robotics papers [1,11], we present the SoA for legged locomotion in unstructured terrain and outperformed previous approaches in terms of mobility and robustness. In [1], we introduced a privileged training approach with a teacher-student setup that enabled the robot to implicitly estimate terrain properties and disturbances during operation and to react accordingly. As a result, the quadrupedal robot is enabled to blindly move across a variety of terrains in outdoor environments such as forests, alpine areas, or gravel pits with unprecedented robustness. Without any handcrafted heuristics, it learned to adapt the gait depending on the haptically perceived terrain and to trigger reflexes when needed.
In [11] we further incorporated exteroceptive perception, i.e. terrain elevation maps, as direct input to the control policy. While rigid and visible obstacles are easy to handle, such approach fails when the robot navigates across compliant grounds such as tall grass or snow, or in case of drift in state estimation and correspondingly elevation mapping. To overcome this issue, we introduced an attention-based recurrent encoder that integrates proprioceptive and exteroceptive input. The encoder is trained end to end and learns to seamlessly combine the different perception modalities without resorting to heuristics. The result is a legged locomotion controller with high robustness and speed, whereby the robot only uses exteroceptive perception in case it is reliable.

Although we developed specific simulation tools, which have been commercialised through www.raisim.com the works in [1,11] required substantial training time (multiple hours or days). This slows down the development process and prevents people from using such tools e.g. for design optimization. In collaboration with industry, we advance the technology by leveraging parallel simulation on GPUs and reduced training time to a few minutes [18].

Towards autonomy of our systems, we developed path planning and navigation strategies [2,3,6,16], which leverage learned traversability costs for different terrains [5,17]. Since the maps are very often imperfect due to reflections, occlusions, or sensor noise, we improved the environment representations (e.g. elevation maps) necessary for our locomotion controller using machine learning methods [15,21] and trained navigation policies that directly leverage the incoming raw sensors stream [9]. For large scale autonomy, which requires a robust mapping and localization pipeline, we applied our learning-based methods for more accurate lidar odometry [4] and to estimate localizability [19], and developed a multi-modal fusion approach for precise and robust localization [20].

To better understand the problem and find impactful solutions, we test our technology not only in theory and in simulation but thoroughly evaluate it in extensive field tests. For example, in this first phase we participated (and won) the DARPA SubT challenge, where the ANYmal robots running the software from [1, 11, 5,16] exhibited the best locomotion performance of all systems.

So far, classic approaches have looked at mapping, navigation planning, path following and locomotion control separately. In the coming years, we will research alternative ways by looking at the end-to-end problem. As a first successful step in this direction, in [23] we directly learn locomotion and navigation, which enables the robot to conduct manoeuvres like jumping over a gap or on a table. By the end of this ERC, we expect our robots to autonomously conduct parcours.

The second major improvement is expected in the field of integration of perception into our locomotion and navigation pipelines. We will develop control methods that directly leverage raw sensor streams and work on navigation algorithms that can utilise semantic understanding of the environment to guide the robot.

The proposed approaches of using reinforcement learning for controlling highly dynamic, nonlinear and complex-to-model systems finds application in many other areas as what is proposed in this ERC project. For demonstration, we transfer and test them in the context of space robotics [7], construction robotics [10], in combination with manipulation [13], and many other challenging problems will follow.

Periodic Reporting for period 2 - LeMo (Learning Mobility for Real Legged Robots)

Partager cette page

Télécharger