Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Optimal Control at Large

Periodic Reporting for period 4 - OCAL (Optimal Control at Large)

Reporting period: 2023-05-01 to 2024-10-31

Optimal control aims to develop decision making algorithms to extract the maximal benefit out of a dynamical system. Optimal control problems arise in a range of application domains, including energy management (where the aim is to meet energy demand with the minimum cost/carbon footprint under constraints imposed by the dynamics of the underlying physical processes), or portfolio optimisation (where the aim could be to maximise return subject to the dynamics and uncertainties of the markets), to name but a few. In the absence of accurate models for the underlying processes, optimal control problems are sometimes treated in a data-driven fashion. This could be in the spirit of reinforcement learning (where optimal decisions are derived by observing the effect of earlier actions and the resulting rewards) or Model Predictive Control (where optimal decisions are derived by using historical system data in lieu of a model). Despite wide-ranging progress on both the theory and applications of optimal control for more than half a century, considerable challenges remain when it comes to applying the resulting methods to large-scale systems. The difficulties become even greater when one moves outside the classical realm of model-based optimal control to address problems where models are replaced by data, or macroscopic behaviours emerge out of microscopic interactions of large populations of agents.

OCAL addressed precisely this challenge, by developing a framework for approximately solving optimal control problems that is both computationally tractable and provides theoretical approximation guarantees. In the context of approximate dynamic programming, the starting point were formulations of optimal control problems linear programs. Since for continuous states and action these programs are infinite dimensional, we developed randomised methods relying on finite dimensional function approximation and the sampling of constraints as a basis for algorithms. Our approach enjoys close connections to statistical learning theory, providing a direct link to data-driven approximation and resulting in the desired theoretical guarantees. Besides uncovering theoretical properties of these methods, however, our work showed that scaling them up to large-scale systems is far from trivial computationally, as empirically it requires an unreasonably large number of constraints to ensure that the approximate linear program remains bounded. We addressed this issue by moving away from random constraint sampling and developing structured, iterative constraint sampling methods. This nicely complemented our parallel on approximate solution of dynamic programming problems for finite state-action problems, where high performance, parallel software was developed for performing the approximation, drawing on a theoretical connection to non-smooth variants of Newton’s method. To demonstrate the efficacy of these methods, in addition to benchmark problems we also applied them to a simulation case study on insulin injection for the treatment of diabetes.

In the context of model predictive control, our work focused on the use of data to alleviate the need to develop a model. We showed that, though primarily inspired by deterministic linear problems where it is exact, this approach can also be used to approximate nonlinear and stochastic problems through regularisation. We were moreover able to establish a close link between the choice of regulariser and the various sources of uncertainty entering the problem. The approximation methodology resulted in a very powerful method that we were able to apply to practical problems, both in simulation and in experiments; examples range from quadrotor control in the lab, to energy management and urban traffic management.
The general approach we have considered involves exploring the environment and collecting input/output data and costs in a reinforcement learning fashion. We can then use the acquired information to formulate a linear program (LP) that returns an approximate optimal policy, encoded through the so-called Q-function. Here we derived approximation schemes that are computationally efficient and at the same time provide explicit probabilistic performance bounds on the quality of the recovered solutions. We also introduced a new contractive operator, the Relaxed Bellman Operator, that can be used to build simpler LPs. In particular, we demonstrate that in the case of linear time-invariant stochastic systems and for all deterministic systems the policy we retrieve coincides with the optimal one without approximation. For model based optimal control we provided a precise interpretation of dynamic programming algorithms as instances of semi-smooth Newton type methods. This opened the door to the development of novel algorithms, but also the deployment of advanced numerical solvers to improve scalability, leading to the release of a high performance, parallelisable toolbox called “madupite”, complementing our work on exploiting the parallel computing architecture of GPU carried out in the first two years of OCAL.

In a parallel stream, we developed methods for removing the "M" in MPC. Model Predictive Control (MPC) methods are very popular in industry and academia, but their reliance on a model sometimes hampers their deployment in settings where models are difficult to obtain and maintain; an example is energy management in buildings and districts. In thois context, we worked on Data Enabled Predictive Control (DeePC) methods for replacing the model in the optimisation problem solved by MPC through constraints involving the so-called Hankel matrices constructed from data. They key challenge we addressed is dealing with systems that are subject to uncertainty; the key ingredient is appropriate regularisation based on methods from stochastic programming and robust optimisation.

OCAL resulted in numerous publications in the best venues of automatic control and the successful completion of three doctoral theses, with two more nearing completion at the end of the project. Our computational work was also released in two open source parallel, high performance implementations, for GPUs and clusters respectively.
OCAL represents a substantial leap forward in the state of the art in optimal control.

Thanks to the project we now understand better the advantages and limitations of the use of numerical methods for linear systems and linear programs in the approximation of the dynamic programming formulation of oprimal control. Besides our methodological results that others can now build on (such as the introduction of the relaxed Bellman operator and the characterisation of its fundamental properties), OCAL also developed open-source computational tools that other groups can use to deploy dynamic programming solutions to optimal control problems of unprecedented scale.

Similarly, in the context of Model Predictive Control, the OCAL results contributed to the development of the Data Enabled Predictive Control methodology, arguably the leading contender for model-free predictive control to day. Our results allowed is to put solid theoretical foundations to support the practice of using regularisation in Data Enabled Predictive Control. This enabled the deployment of the methods to difficult nonlinear problems such as power systems or the control or urban mobility systems.
picture1.png
My booklet 0 0