Overcoming the curse of dimensionality through nonlinear stochastic algorithms

Projektinformationen

MONTECARLO

ID Finanzhilfevereinbarung: 101045811

DOI

10.3030/101045811

Das Projekt endete am 31 August 2025

EK-Unterschriftsdatum 29 November 2022

Startdatum 1 Juli 2023

Enddatum 30 Juni 2028

Finanziert unter

European Research Council (ERC)

Gesamtkosten

€ 1 351 528,00

EU-Beitrag

€ 1 351 528,00

1 351 528,00

Koordiniert durch

UNIVERSITAET MUENSTER
Germany

Periodic Reporting for period 1 - MONTECARLO (Overcoming the curse of dimensionality through nonlinear stochastic algorithms)

Berichtszeitraum: 2023-07-01 bis 2025-12-31

In a series of relevant real world problems it is of fundamental importance to approximately compute evaluations of high-dimensional functions. Such high-dimensional approximation problems appear, e.g. in stochastic optimal control problems in operations research, e.g. in supervised learning problems, e.g. in financial engineering where partial differential equations (PDEs) and forward backward stochastic differential equations (FBSDEs) are used to approximately price financial products, and, e.g. in nonlinear filtering problems where stochastic PDEs are used to approximately describe the state of a given physical system with only partial information available. Standard approximation methods for such approximation problems suffer from the so-called curse of dimensionality in the sense that the number of computational operations of the approximation method grows at least exponentially in the problem dimension.

It is the key objective of this project to design and analyze approximation algorithms which provably overcome the curse of dimensionality in the case of stochastic optimal control problems, nonlinear PDEs, nonlinear FBSDEs, certain SPDEs, and certain supervised learning problems. We intend to solve many of the above named approximation problems by combining different types of multilevel Monte Carlo approximation methods, in particular, multilevel Picard approximation methods, with stochastic gradient descent (SGD) optimization methods.

Another chief objective of this project is to prove the conjecture that the SGD optimization method converges in the training of ANNs with the ReLU activation. We expect that the outcome of this project will have a significant impact on the way how high-dimensional PDEs, FBSDEs, and stochastic optimal control problems are solved in engineering and operations research and on the mathematical understanding of the training of ANNs by means of the SGD optimization method.

For considerably simplified supervised ANN learning problems (shallow ANNs, increasing and piecewise analytic 1-dimensional target functions, only bias parameter training) we established a comprehensive error analysis in the sense that we proved in the training of such simplified ANNs that the risk of the gradient flow optimization trajectory converges in probability to zero with convergence rates (convergence of the overall learning error) as both the training time and the width of the ANN (i.e. the number of neurons on the hidden layer of the ANN) tend to infinity. The optimization landscapes in the considered simplified ANN training scenarios are, however, as in the situation of the training of all ANN parameters, highly non-convex with many saddle points and non-global local minimizers.

We also verified in the training of shallow ReLU ANNs that gradient descent with random initialization almost surely fails to converge to strict saddle points. Moreover, in the training of several shallow residual ANNs with the ReLU activation we revealed the existence of minimizers in the ANN optimization landscape.

In the training of ReLU ANNs we also established several non-convergence results for stochastic gradient desent (SGD) optimization methods (including, e.g. the famous Adam SGD optimization method). In particular, for supervised learning problems we showed that with high probability SGD methods do not converge to global minimizers in the ANN optimization landscape.

We also established several convergence results for SGD optimization methods. In particular, we proved convergence of SGD with adaptive learning rates. We also introduced a vector valued function, which we refer to as the Adam vector field, and we revealed that every limit point of the Adam SGD optimization method must be a zero of this Adam vector field. Moreover, we showed convergence of Adam with convergence rates to zeros of the Adam vector field and we proved several a priori bounds for gradient based optimization methods.

We also introduced and analyzed suitable multilevel Picard (MLP) approximations that can approximately compute evaluations of solutions of high-dimensional Bellman equations of time-discrete stochastic optimal control problems without the curse of dimensionality (COD). Moreover, we showed that appropriate MLP methods can approximate solutions of high-dimensional semilinear elliptic PDEs with Lipschitz nonlinearities without the COD.

Presumably the most popular sophisticated stochastic gradient descent (SGD) optimization method is the Adam SGD optimization method proposed by Kingma & Ba in 2014 -- a work that has now gathered more than 220 000 citations according to Google Scholar. Despite the overwhelming success of Adam in the training of AI systems, it remains an open research problem of fundamental relevance to provide a comprehensive error analysis and to establish convergence and convergence rates for Adam even in the situation of strongly convex stochastic optimization problems (SOPs).

In this project we developed a partial solution to this fundamental research problem by introducing a new vector valued function, which we refer to as the Adam vector field, and by developing a convergence theory based on this Adam vector field. In particular, under strong convexity assumptions on the Adam vector field we prove that Adam converges with optimal convergence rates to a zero of this Adam vector field. In many cases we also disprove that Adam converges to the unique global minimizer of the considered strongly convex SOP as the zero of the Adam vector field does not coincide with the global minimizer of the SOP. Nonetheless, we developed an overall error analysis for Adam for strongly convex SOPs that contains convergence rates for the distance of Adam to the zero of the Adam vector field in terms of the number of Adam steps as well as convergence rates for the distance of the zero of the Adam vector field to the global minimizer of the SOP in terms of the parameters of Adam and the mini-batch size. The proposed vector field approach thereby opens the door for a complete solution of the above sketched fundamental research problem. Furthermore, even though the developed convergence results are only formulated for the Adam SGD optimization method, the arguments in our convergence analysis can also be applied to other related SGD optimization methods and thereby offer the opportunity for a systematic mathematical treatment of a large class of adaptive and/or accelerated SGD optimization methods.

We also developed the first non-convergence results for SGD methods which show that with high probability we have that in the training of ANNs the risk of SGD methods does not converge to the optimal risk value (the infimal value of the objective function). Furthermore, we established the first existence result for global minimizers in the training of residual ANNs with the ReLU activation.

Periodic Reporting for period 1 - MONTECARLO (Overcoming the curse of dimensionality through nonlinear stochastic algorithms)

Diese Seite teilen Diese Seite in sozialen Netzwerken teilen

Herunterladen Den Inhalt der Seite herunterladen