Periodic Reporting for period 3 - AGILEFLIGHT (Low-latency Perception and Action for Agile Vision-based Flight)
Periodo di rendicontazione: 2023-09-01 al 2025-02-28
[1] Grand View Research, Commercial drone market size, share and trends analysis, 2021: https://www.grandviewresearch.com/industry-analysis/global-commercial-drones-market(si apre in una nuova finestra)
[2] L. Bauersfeld, D. Scaramuzza, Range, Endurance, and Optimal Speed Estimates for Multicopters, Robotics and Automation Letters (RAL), 2022: https://rpg.ifi.uzh.ch/docs/RAL22_Bauersfeld.pdf(si apre in una nuova finestra)
We focused on learning agile end-to-end control policies for quadcopters, which are trained entirely in simulation with either zero or minimal fine-tuning in the real world. Two applications were targeted: 1) Agile navigation in unknown environments (e.g. forests, search-and-rescue environments), 2) Autonomous drone racing:
- We showed for the first time that we could fly faster than ever before (up to 40 km/h) in cluttered forest environments, snowy terrains, and search-and-rescue environments.
- We presented a neural network controller trained via deep reinforcement learning rather than classic controllers that can race vision-based autonomous quadcopters at speeds competitive with human world champions and even outfly them in speed. The paper was published in Nature and featured in The Guardian.
- We investigated the reasons why reinforcement learning outperforms optimal control at racing. The key finding is that RL can optimize a sparse, task-level, non-differentiable objective function directly, allowing for the discovery of new robust control behaviors. The paper was published in Science Robotics.
- We presented the first end-to-end vision-based neural-network controller trained via RL that can fly a drone fast through a racing course without state estimation, without IMU, without SLAM. This paper was one of the best papers awarded at RSS 2024.
- We presented the first VO-SLAM algorithm that uses RL to tune the VO-SLAM parameters at deployment time. This algorithm was transferred to NASA JPL for the next Mars helicopter mission.
- We present the first optimal controller (MPC) whose cost function is learned via RL. This is done by stacking a differentiable MPC after the last layer of the actor network. The controller, named Actor-Critic MPC, reaches super-human performance comparable to model-free RL. AC-MPC appears more robust and generalizable than model-free RL, and uses fewer samples. However, training time is still slower due to the inference of the Differentiable MPC block.
- We presented the first neural network controller to stabilize a quadrotor from an agile flight directly from visual features (without explicit state estimation) via differentiable simulation.
- We presented an application of our drone racing research to power line inspection. The paper was awarded the best paper award at IROS.
**Event Cameras**
- We presented methods to allow a 200-fold computational complexity reduction of computer vision algorithms for neuromorphic event cameras, thanks to an algorithm we coined Asynchronous Graph Neural Networks.
- We addressed the current shortage of datasets to train deep networks for event cameras by using unsupervised domain adaptation to transfer labels from standard images to events.
- We proposed the first recurrent vision transformers for object detection with event cameras, allowing for the first time to achieve an object detection latency below 10 ms with comparable accuracy to the state of the art.
- We proposed the first data-driven feature tracker for event cameras. We demonstrated that, thanks to deep learning, feature tracks are up to twice as long as those achieved with model-based approaches and exhibit lower latency. The paper was selected as an award candidate at the IEEE Conference on Computer Vision and Pattern Recognition (award candidate selection rate 0.5%)
- We demonstrated that thanks to event cameras, we could make a quadruped robot catch objects tossed from 4 m with relative speeds up to 15 m/s.
- We presented the first Recurrent Vision Transformer for event cameras and showed applications to automotive for traffic participant detection.
- We presented the first learning based approach for feature tracking with event cameras. The paper was a finalist for the Best Paper Award at CVPR.
- We presented the first NERF resilient to large motion blur thanks to event cameras.
- We presented the first paper with event cameras that achieves an unprecedented 0.2 ms latency for traffic participant detection on automotive datasets, thanks to event cameras. The paper was published in Nature. It is also the first paper on event cameras ever published in Nature.
- We presented the first low-latency forest navigation of a drone with an event camera.
- We presented the first combination of event cameras and SPAD for low-latency, low-lighting computational photography.
The results were published in top robotics journals, such as Science Robotics and IEEE Transactions on Robotics, and have received worldwide media coverage, including The Economist, Forbes, and IEEE Spectrum.
References:
[1] Kaufmann, Bauersfeld, Loquercio, Mueller, Koltun, Scaramuzza, Champion-Level Drone Racing using Deep Reinforcement Learning, Nature, 2023.
[2] Daniel Gehrig, Davide Scaramuzza, Low Latency Automotive Vision with Event Cameras, Nature, 2024.