Skip to main content
Vai all'homepage della Commissione europea (si apre in una nuova finestra)
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

Low-latency Perception and Action for Agile Vision-based Flight

Periodic Reporting for period 3 - AGILEFLIGHT (Low-latency Perception and Action for Agile Vision-based Flight)

Periodo di rendicontazione: 2023-09-01 al 2025-02-28

Drones are disrupting industries ranging from bridge and power line inspections, videography and filmography, agriculture, law enforcement, delivery, inventory management, and search and rescue. According to a report by Grand View Research published in September 2021 [1], the global commercial drone market is valued at $24 billion and is expected to reach $500 billion by 2028. However, commercial drones are still operated by expert human pilots. Autonomous drones that can substitute for human pilots in terms of maneuverability and agility in unknown, GPS-denied, complex, and potentially dynamic environments do not yet exist. This is crucial for several reasons, such as minimizing the risk of injuries and damages, maximizing the efficiency and effectiveness of the operation, [2] and accessing remote areas beyond the remote controller's communication range. The overarching goal of this project is to develop scientific methods for flying autonomous, vision-based drones with performance comparable to or even surpassing that of human pilots (i.e. human-level or superhuman-level performance) using onboard standard or neuromorphic event cameras and onboard computation. We argue that the main reason why traditional model-based approaches did not achieve human-level performance is that they are slow and sensitive to imperfect perception and unmodeled effects (notably perception and aerodynamics). We propose that combining model-based approaches and deep networks, along with the low-latency and high-temporal resolution of neuromorphic event cameras, can address these challenges.

[1] Grand View Research, Commercial drone market size, share and trends analysis, 2021: https://www.grandviewresearch.com/industry-analysis/global-commercial-drones-market(si apre in una nuova finestra)
[2] L. Bauersfeld, D. Scaramuzza, Range, Endurance, and Optimal Speed Estimates for Multicopters, Robotics and Automation Letters (RAL), 2022: https://rpg.ifi.uzh.ch/docs/RAL22_Bauersfeld.pdf(si apre in una nuova finestra)
In this 5-year project, we have made significant progress in autonomous navigation of agile vision-based quadcopters and event cameras. The results were published in top robotics journals, such as Nature, Science Robotics and IEEE Transactions on Robotics or Pattern Analysis and Machine Intelligence, and have received worldwide media coverage, including The Economist, Forbes, and IEEE Spectrum. We won numerous awards, including career awards and paper awards.
**Autonomous Navigation of Agile Vision-based Quadcopters**

We focused on learning agile end-to-end control policies for quadcopters, which are trained entirely in simulation with either zero or minimal fine-tuning in the real world. Two applications were targeted: 1) Agile navigation in unknown environments (e.g. forests, search-and-rescue environments), 2) Autonomous drone racing:

- We showed for the first time that we could fly faster than ever before (up to 40 km/h) in cluttered forest environments, snowy terrains, and search-and-rescue environments.

- We presented a neural network controller trained via deep reinforcement learning rather than classic controllers that can race vision-based autonomous quadcopters at speeds competitive with human world champions and even outfly them in speed. The paper was published in Nature and featured in The Guardian.

- We investigated the reasons why reinforcement learning outperforms optimal control at racing. The key finding is that RL can optimize a sparse, task-level, non-differentiable objective function directly, allowing for the discovery of new robust control behaviors. The paper was published in Science Robotics.

- We presented the first end-to-end vision-based neural-network controller trained via RL that can fly a drone fast through a racing course without state estimation, without IMU, without SLAM. This paper was one of the best papers awarded at RSS 2024.

- We presented the first VO-SLAM algorithm that uses RL to tune the VO-SLAM parameters at deployment time. This algorithm was transferred to NASA JPL for the next Mars helicopter mission.

- We present the first optimal controller (MPC) whose cost function is learned via RL. This is done by stacking a differentiable MPC after the last layer of the actor network. The controller, named Actor-Critic MPC, reaches super-human performance comparable to model-free RL. AC-MPC appears more robust and generalizable than model-free RL, and uses fewer samples. However, training time is still slower due to the inference of the Differentiable MPC block.

- We presented the first neural network controller to stabilize a quadrotor from an agile flight directly from visual features (without explicit state estimation) via differentiable simulation.

- We presented an application of our drone racing research to power line inspection. The paper was awarded the best paper award at IROS.


**Event Cameras**

- We presented methods to allow a 200-fold computational complexity reduction of computer vision algorithms for neuromorphic event cameras, thanks to an algorithm we coined Asynchronous Graph Neural Networks.

- We addressed the current shortage of datasets to train deep networks for event cameras by using unsupervised domain adaptation to transfer labels from standard images to events.

- We proposed the first recurrent vision transformers for object detection with event cameras, allowing for the first time to achieve an object detection latency below 10 ms with comparable accuracy to the state of the art.

- We proposed the first data-driven feature tracker for event cameras. We demonstrated that, thanks to deep learning, feature tracks are up to twice as long as those achieved with model-based approaches and exhibit lower latency. The paper was selected as an award candidate at the IEEE Conference on Computer Vision and Pattern Recognition (award candidate selection rate 0.5%)

- We demonstrated that thanks to event cameras, we could make a quadruped robot catch objects tossed from 4 m with relative speeds up to 15 m/s.

- We presented the first Recurrent Vision Transformer for event cameras and showed applications to automotive for traffic participant detection.

- We presented the first learning based approach for feature tracking with event cameras. The paper was a finalist for the Best Paper Award at CVPR.

- We presented the first NERF resilient to large motion blur thanks to event cameras.

- We presented the first paper with event cameras that achieves an unprecedented 0.2 ms latency for traffic participant detection on automotive datasets, thanks to event cameras. The paper was published in Nature. It is also the first paper on event cameras ever published in Nature.

- We presented the first low-latency forest navigation of a drone with an event camera.

- We presented the first combination of event cameras and SPAD for low-latency, low-lighting computational photography.

The results were published in top robotics journals, such as Science Robotics and IEEE Transactions on Robotics, and have received worldwide media coverage, including The Economist, Forbes, and IEEE Spectrum.

References:

[1] Kaufmann, Bauersfeld, Loquercio, Mueller, Koltun, Scaramuzza, Champion-Level Drone Racing using Deep Reinforcement Learning, Nature, 2023.
[2] Daniel Gehrig, Davide Scaramuzza, Low Latency Automotive Vision with Event Cameras, Nature, 2024.
204-2021-06-11-swiss-drone-days-rs-8120.jpg
bild-01.jpg
image1.jpg
drone-led.jpg
cover-4.jpg
drone-smoke.jpg
cover-7.png
cover-1.jpg
cover.jpg
motionblurmedium-opacity000.jpg
cover-less-saturation.jpg
simulation-cover-1.png
604-2021-06-11-swiss-drone-days-rs-7436.jpg
567-2021-06-11-swiss-drone-days-rs-9091.jpg
Il mio fascicolo 0 0