Periodic Reporting for period 1 - MONTECARLO (Overcoming the curse of dimensionality through nonlinear stochastic algorithms)
Berichtszeitraum: 2023-07-01 bis 2025-12-31
It is the key objective of this project to design and analyze approximation algorithms which provably overcome the curse of dimensionality in the case of stochastic optimal control problems, nonlinear PDEs, nonlinear FBSDEs, certain SPDEs, and certain supervised learning problems. We intend to solve many of the above named approximation problems by combining different types of multilevel Monte Carlo approximation methods, in particular, multilevel Picard approximation methods, with stochastic gradient descent (SGD) optimization methods.
Another chief objective of this project is to prove the conjecture that the SGD optimization method converges in the training of ANNs with the ReLU activation. We expect that the outcome of this project will have a significant impact on the way how high-dimensional PDEs, FBSDEs, and stochastic optimal control problems are solved in engineering and operations research and on the mathematical understanding of the training of ANNs by means of the SGD optimization method.
We also verified in the training of shallow ReLU ANNs that gradient descent with random initialization almost surely fails to converge to strict saddle points. Moreover, in the training of several shallow residual ANNs with the ReLU activation we revealed the existence of minimizers in the ANN optimization landscape.
In the training of ReLU ANNs we also established several non-convergence results for stochastic gradient desent (SGD) optimization methods (including, e.g. the famous Adam SGD optimization method). In particular, for supervised learning problems we showed that with high probability SGD methods do not converge to global minimizers in the ANN optimization landscape.
We also established several convergence results for SGD optimization methods. In particular, we proved convergence of SGD with adaptive learning rates. We also introduced a vector valued function, which we refer to as the Adam vector field, and we revealed that every limit point of the Adam SGD optimization method must be a zero of this Adam vector field. Moreover, we showed convergence of Adam with convergence rates to zeros of the Adam vector field and we proved several a priori bounds for gradient based optimization methods.
We also introduced and analyzed suitable multilevel Picard (MLP) approximations that can approximately compute evaluations of solutions of high-dimensional Bellman equations of time-discrete stochastic optimal control problems without the curse of dimensionality (COD). Moreover, we showed that appropriate MLP methods can approximate solutions of high-dimensional semilinear elliptic PDEs with Lipschitz nonlinearities without the COD.
In this project we developed a partial solution to this fundamental research problem by introducing a new vector valued function, which we refer to as the Adam vector field, and by developing a convergence theory based on this Adam vector field. In particular, under strong convexity assumptions on the Adam vector field we prove that Adam converges with optimal convergence rates to a zero of this Adam vector field. In many cases we also disprove that Adam converges to the unique global minimizer of the considered strongly convex SOP as the zero of the Adam vector field does not coincide with the global minimizer of the SOP. Nonetheless, we developed an overall error analysis for Adam for strongly convex SOPs that contains convergence rates for the distance of Adam to the zero of the Adam vector field in terms of the number of Adam steps as well as convergence rates for the distance of the zero of the Adam vector field to the global minimizer of the SOP in terms of the parameters of Adam and the mini-batch size. The proposed vector field approach thereby opens the door for a complete solution of the above sketched fundamental research problem. Furthermore, even though the developed convergence results are only formulated for the Adam SGD optimization method, the arguments in our convergence analysis can also be applied to other related SGD optimization methods and thereby offer the opportunity for a systematic mathematical treatment of a large class of adaptive and/or accelerated SGD optimization methods.
We also developed the first non-convergence results for SGD methods which show that with high probability we have that in the training of ANNs the risk of SGD methods does not converge to the optimal risk value (the infimal value of the objective function). Furthermore, we established the first existence result for global minimizers in the training of residual ANNs with the ReLU activation.