Periodic Reporting for period 2 - LeMo (Learning Mobility for Real Legged Robots)
Reporting period: 2021-07-01 to 2022-12-31
This new learning-based perceptive locomotion has the potential to outperform existing approaches massively. In particular the robustness when dealing with uncertain and unstructured environments can significantly improve. Moreover, the systems can be pushed much closer to their performance limits (e.g. actuation torque and speed limits), since the proposed methods can better handle the nonlinearities and constraints.
Overall, the work is expected to be essential for a new generation of mobile robotic systems that will find application in various fields such as industrial inspection and maintenance, search and rescue, or planetary exploration. Thanks to this technology, legged robots, which still suffered from falls and failures when starting this project, are expected to be able to traverse extremely challenging terrain with unprecedented mobility and reliability by the end of this project.
The proposed tools are expected to be holistic and generalizable, which means that new machines can be controllable in a short time and with comparable low development effort. This will lower the entrance barrier and development time for researchers and engineers and offer new opportunities for commercial solutions.
In [11] we further incorporated exteroceptive perception, i.e. terrain elevation maps, as direct input to the control policy. While rigid and visible obstacles are easy to handle, such approach fails when the robot navigates across compliant grounds such as tall grass or snow, or in case of drift in state estimation and correspondingly elevation mapping. To overcome this issue, we introduced an attention-based recurrent encoder that integrates proprioceptive and exteroceptive input. The encoder is trained end to end and learns to seamlessly combine the different perception modalities without resorting to heuristics. The result is a legged locomotion controller with high robustness and speed, whereby the robot only uses exteroceptive perception in case it is reliable.
Although we developed specific simulation tools, which have been commercialised through www.raisim.com the works in [1,11] required substantial training time (multiple hours or days). This slows down the development process and prevents people from using such tools e.g. for design optimization. In collaboration with industry, we advance the technology by leveraging parallel simulation on GPUs and reduced training time to a few minutes [18].
Towards autonomy of our systems, we developed path planning and navigation strategies [2,3,6,16], which leverage learned traversability costs for different terrains [5,17]. Since the maps are very often imperfect due to reflections, occlusions, or sensor noise, we improved the environment representations (e.g. elevation maps) necessary for our locomotion controller using machine learning methods [15,21] and trained navigation policies that directly leverage the incoming raw sensors stream [9]. For large scale autonomy, which requires a robust mapping and localization pipeline, we applied our learning-based methods for more accurate lidar odometry [4] and to estimate localizability [19], and developed a multi-modal fusion approach for precise and robust localization [20].
So far, classic approaches have looked at mapping, navigation planning, path following and locomotion control separately. In the coming years, we will research alternative ways by looking at the end-to-end problem. As a first successful step in this direction, in [23] we directly learn locomotion and navigation, which enables the robot to conduct manoeuvres like jumping over a gap or on a table. By the end of this ERC, we expect our robots to autonomously conduct parcours.
The second major improvement is expected in the field of integration of perception into our locomotion and navigation pipelines. We will develop control methods that directly leverage raw sensor streams and work on navigation algorithms that can utilise semantic understanding of the environment to guide the robot.
The proposed approaches of using reinforcement learning for controlling highly dynamic, nonlinear and complex-to-model systems finds application in many other areas as what is proposed in this ERC project. For demonstration, we transfer and test them in the context of space robotics [7], construction robotics [10], in combination with manipulation [13], and many other challenging problems will follow.