Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Distributed Learning-Based Control for Multi-Agent Systems

Periodic Reporting for period 2 - DiLeBaCo (Distributed Learning-Based Control for Multi-Agent Systems)

Reporting period: 2021-04-07 to 2022-04-06

Multi-agent systems offer a tremendous potential to improve the quality of modern society life. For instance, robotic networks will increase food production, or engage in search and rescue missions. Fleets of autonomous cars will reduce traffic congestion and fuel consumption while increasing road safety. With 45% of all freight being transported by road, transportation makes up 26% of the total EU energy consumption and accounts for 18% of the greenhouse gas emissions. Fuel reduction in this area will have a significant impact on the environment. This will be achieved through platooning where heavy-duty vehicles drive close to each other in groups, which reduces their aerodynamic drag and thus increases their fuel efficiency.

The challenge in controlling autonomous systems is their increasing complexity. This is due to interactions between multiple autonomous agents in a system, and by the complex dynamic environments they are operated in. Therefore, state-of-the-art classical control methods are either overly conservative leading to a poor control performance, or they cannot guarantee safety. In particular, since these classical methods are based on analytical models they might even be impossible to be used.

The objective of the research project therefore is to fuse methods from machine learning with control approaches in order to guarantee both performance and safety of the controlled systems. In particular, previously seen data (from experience) or simulated data (rollouts) are used in order to learn missing information arising from the complexities of the systems, i.e. about the optimal control policies, about the dynamical model of the systems, or about the complex and dynamic environment they are navigated in. The specific goals of the project are to 1) develop novel control algorithms by fusing methods from machine learning with control approaches 2) to guarantee safety and performance of the developed algorithms 3) to focus on data-efficiency, scalability and computational efficiency of these methods, such that they can be applied online in real-time, and for complex multi-agent systems.

The project has shown that in the area of complex systems (dealing with multiple coupled agents, dynamic environments and safety-critical systems) the combination of machine learning methods with the framework of model predictive control has a great potential to immensely increase the performance of classical control algorithms, while at the same time providing safety guarantees. This direction should further be exploited in order to bring high performing and safe algorithms to relevant real-world applications in areas such as heavy-duty platooning, autonomous driving or robotic networks.
This research project is focused on complex autonomous systems, where the complexities mainly arise from interactions between multiple agents, from tightly constrained and dynamic environments, and human-interactions. To cope with these complexities, classical model-based robust control methods can be overly conservative or even impossible to be implemented. Therefore, data-driven control methods have been developed. In particular, methods from machine learning have been fused with control algorithms in order to achieve a high performance and guarantee safety for the controlled complex systems.

Data-driven methods have been developed to learn optimal control policies for complex interconnected and multi-agent systems, leading to (local) optimal performance and safe control behavior. The results have been validated in extensive simulations of distributed linear systems and of multi-agent nonlinear systems. For the latter, an application example is a time-optimal navigation task of multiple agents to a desired goal position, while ensuring collision avoidance with all other agents. The optimal control policies are iteratively learned from previously seen data and employed in a decentralized way (without communication between the agents), leading to a scalable (applicable to a large number of subsystems) data-driven control method with locally optimal control performance and guaranteed safety (collision avoidance).

Furthermore, a hierarchical control framework has been developed, where previously recorded data is used to learn a higher-level strategy to guide the lower-level optimization problem. The advantage is that the underlying optimization problem is less complex and thus can be solved online. Furthermore, both a good control performance as well as the safety of the controlled system through a finite state machine are guaranteed, even for control tasks that need to navigate in tightly constrained and dynamic environments with other human-driven cars. One considered control scenario is autonomous driving in a tight parking lot, where other human-driven cars are driving and parking into empty spots. Since the environment is dynamic and tightly constrained, the exact optimization problem that needs to be solved for controlling the autonomous car is too complex to be solved online in real-time. The performance and safety of the novel hierarchical control framework for this control task were validated in extensive simulations, and in experiments at UC Berkeley.
These methods have further been fused with the developed distributed methods for optimal data generation, and have been applied to the problem of platooning in mixed traffic conditions.

In order to cope with complex dynamical systems that are impossible to be modeled analytically, methods have been developed that replace the analytical model by a purely data-driven representation based on matrix zonotopes from reachability theory. The data-driven representation only needs one pair of input-output trajectories from the system. The algorithm, called ZPC (zonotopic predictive control), can cope with noisy data. Robust safety guarantees for this novel method have been provided.
The developed data-driven optimal control methods achieve global (for linear systems) or local (for nonlinear systems) optimality while guaranteeing safety. This result has not been achieved before by distributed or decentralized control algorithms for general distributed systems. The developed data-driven hierarchical control framework combines learning strategies from optimal data with guiding a lower-level optimization problem that can thus be solved online. In this way, a high level of interpretability is maintained. Safety is guaranteed through a finite state machine, and good performance is achieved through the data-driven approach. These advantages of the novel framework have been demonstrated and confirmed in experiments. Furthermore, while the developed methods so far are partially based on analytical models of the systems, paired with machine learning techniques, an entirely data-driven optimal control approach has been investigated for systems where analytical models are arbitrarily difficult to obtain.

The work carried out enables efficient and safe control of multi-agent systems. The developed methods have been used in the research project for heavy-duty platooning at KTH. Here, safety in terms of collision avoidance is indispensable, while an optimal performance in terms of fuel reduction and time optimality are highly desirable. In this application area, there is a tremendous potential to increase road safety, and to reduce fuel consumption, thus directly contributing towards European policy objectives.
Scheme of the zonotopic predictive control (ZPC)
Complex maneuvers on a tightly-constrained parking by the hierarchical data-driven control framework
Trajectories for data generation and by optimized distributed control policies by Dezentralized LMPC