Periodic Reporting for period 2 - DiLeBaCo (Distributed Learning-Based Control for Multi-Agent Systems)
Reporting period: 2021-04-07 to 2022-04-06
The challenge in controlling autonomous systems is their increasing complexity. This is due to interactions between multiple autonomous agents in a system, and by the complex dynamic environments they are operated in. Therefore, state-of-the-art classical control methods are either overly conservative leading to a poor control performance, or they cannot guarantee safety. In particular, since these classical methods are based on analytical models they might even be impossible to be used.
The objective of the research project therefore is to fuse methods from machine learning with control approaches in order to guarantee both performance and safety of the controlled systems. In particular, previously seen data (from experience) or simulated data (rollouts) are used in order to learn missing information arising from the complexities of the systems, i.e. about the optimal control policies, about the dynamical model of the systems, or about the complex and dynamic environment they are navigated in. The specific goals of the project are to 1) develop novel control algorithms by fusing methods from machine learning with control approaches 2) to guarantee safety and performance of the developed algorithms 3) to focus on data-efficiency, scalability and computational efficiency of these methods, such that they can be applied online in real-time, and for complex multi-agent systems.
The project has shown that in the area of complex systems (dealing with multiple coupled agents, dynamic environments and safety-critical systems) the combination of machine learning methods with the framework of model predictive control has a great potential to immensely increase the performance of classical control algorithms, while at the same time providing safety guarantees. This direction should further be exploited in order to bring high performing and safe algorithms to relevant real-world applications in areas such as heavy-duty platooning, autonomous driving or robotic networks.
Data-driven methods have been developed to learn optimal control policies for complex interconnected and multi-agent systems, leading to (local) optimal performance and safe control behavior. The results have been validated in extensive simulations of distributed linear systems and of multi-agent nonlinear systems. For the latter, an application example is a time-optimal navigation task of multiple agents to a desired goal position, while ensuring collision avoidance with all other agents. The optimal control policies are iteratively learned from previously seen data and employed in a decentralized way (without communication between the agents), leading to a scalable (applicable to a large number of subsystems) data-driven control method with locally optimal control performance and guaranteed safety (collision avoidance).
Furthermore, a hierarchical control framework has been developed, where previously recorded data is used to learn a higher-level strategy to guide the lower-level optimization problem. The advantage is that the underlying optimization problem is less complex and thus can be solved online. Furthermore, both a good control performance as well as the safety of the controlled system through a finite state machine are guaranteed, even for control tasks that need to navigate in tightly constrained and dynamic environments with other human-driven cars. One considered control scenario is autonomous driving in a tight parking lot, where other human-driven cars are driving and parking into empty spots. Since the environment is dynamic and tightly constrained, the exact optimization problem that needs to be solved for controlling the autonomous car is too complex to be solved online in real-time. The performance and safety of the novel hierarchical control framework for this control task were validated in extensive simulations, and in experiments at UC Berkeley.
These methods have further been fused with the developed distributed methods for optimal data generation, and have been applied to the problem of platooning in mixed traffic conditions.
In order to cope with complex dynamical systems that are impossible to be modeled analytically, methods have been developed that replace the analytical model by a purely data-driven representation based on matrix zonotopes from reachability theory. The data-driven representation only needs one pair of input-output trajectories from the system. The algorithm, called ZPC (zonotopic predictive control), can cope with noisy data. Robust safety guarantees for this novel method have been provided.
The work carried out enables efficient and safe control of multi-agent systems. The developed methods have been used in the research project for heavy-duty platooning at KTH. Here, safety in terms of collision avoidance is indispensable, while an optimal performance in terms of fuel reduction and time optimality are highly desirable. In this application area, there is a tremendous potential to increase road safety, and to reduce fuel consumption, thus directly contributing towards European policy objectives.