## Periodic Reporting for period 2 - ACCOPT (ACelerated COnvex OPTimization)

Reporting period: 2020-03-01 to 2021-08-31

A rapid progress in the computer technologies and telecommunications presents many new challenges for Optimization. Now, it is easy to collect very large volumes of information. Availability of search engines tremendously facilitates creation of nontrivial models. The growth in computer power and possibilities of distributed computations significantly increases abilities of numerical methods.

Now we can speak about new type of problems, waiting for development of efficient optimization schemes. They are usually very big in size, very special in structure and often have a distributed data. This makes them unsolvable by the standard methods. Thus, old theoretical approaches, based on the Black-Box information, cannot work. New theoretical and algorithmic solutions are urgently needed. In this project, we concentrate on development of fast optimization methods for problems of big and very big size. All new methods will be endowed with provable efficiency guarantees for convex optimization problems, arising in practical applications. Our main tool is the acceleration technique as applied to smooth convex functions. However, we adapt it to different situations.

The main novelty of the current situation in Optimization is the possibility of efficient use of high-order methods. The standard methods for solving large-scale problems were the first-order schemes. The high-order methods are faster both in theory and in practice. However, they have much more complicated iteration and require evaluation of the expensive high-order derivatives. Thus, it was not clear if a significant drop in the number of iterations is not eliminated by internal complexity of these schemes. The main goal of this project is to prove that the high-order schemes are indeed more efficient in practice.

For achieving our objectives, we perform a detailed complexity analysis of the 2nd-order methods and more general tensor methods. It is very interesting to develop a new methodology of complexity analysis, working equally well for methods of any order. However, we need to start with further development of the existing technique.

Our first research direction is the detailed analysis of 2nd-order schemes for different problem classes. For ensuring applicability of these methods to large-scale problems, we analyze their performance with inexact implementation of each iteration. Another way to decrease the iteration cost of the methods is a proper implementation of quasi-Newton technique for approximating Hessians. We need to justify the global complexity bounds for quasi-Newton methods, which are still unknown. Also, we need to clarify the higher abilities of tensor methods, having in mind an efficient implementation of iterations of these schemes. For methods of all types, we need to develop an acceleration technique, which is similar to the approach of estimating functions existing for the 1st-order method.

Our main problem classes are related to smooth approximations of non-differentiable functions. We extend the standard problem setting by considering approximate solutions in relative scale. We are going to develop the efficient strategies applicable to sparse derivatives of different order. Finally, we will develop accelerated methods for huge-scale problems. In our approach, the main source for improvements will be the proper use of problem structure.

Our goal is to solve the important problems, which are looking unsolvable now. Theoretical development of Optimization must reach the state, when there is no gap between theory and practice: the theoretically efficient methods must considerably outperform any heuristics. Optimization becomes more and more important. Many new applications of Artificial Intelligence like Neural Networks, Deep Learning, Image Processing, and others are using intensively new optimization methods. The progress in these fiends is strongly dependent on the new abilities provided by the new optimization methods.

Now we can speak about new type of problems, waiting for development of efficient optimization schemes. They are usually very big in size, very special in structure and often have a distributed data. This makes them unsolvable by the standard methods. Thus, old theoretical approaches, based on the Black-Box information, cannot work. New theoretical and algorithmic solutions are urgently needed. In this project, we concentrate on development of fast optimization methods for problems of big and very big size. All new methods will be endowed with provable efficiency guarantees for convex optimization problems, arising in practical applications. Our main tool is the acceleration technique as applied to smooth convex functions. However, we adapt it to different situations.

The main novelty of the current situation in Optimization is the possibility of efficient use of high-order methods. The standard methods for solving large-scale problems were the first-order schemes. The high-order methods are faster both in theory and in practice. However, they have much more complicated iteration and require evaluation of the expensive high-order derivatives. Thus, it was not clear if a significant drop in the number of iterations is not eliminated by internal complexity of these schemes. The main goal of this project is to prove that the high-order schemes are indeed more efficient in practice.

For achieving our objectives, we perform a detailed complexity analysis of the 2nd-order methods and more general tensor methods. It is very interesting to develop a new methodology of complexity analysis, working equally well for methods of any order. However, we need to start with further development of the existing technique.

Our first research direction is the detailed analysis of 2nd-order schemes for different problem classes. For ensuring applicability of these methods to large-scale problems, we analyze their performance with inexact implementation of each iteration. Another way to decrease the iteration cost of the methods is a proper implementation of quasi-Newton technique for approximating Hessians. We need to justify the global complexity bounds for quasi-Newton methods, which are still unknown. Also, we need to clarify the higher abilities of tensor methods, having in mind an efficient implementation of iterations of these schemes. For methods of all types, we need to develop an acceleration technique, which is similar to the approach of estimating functions existing for the 1st-order method.

Our main problem classes are related to smooth approximations of non-differentiable functions. We extend the standard problem setting by considering approximate solutions in relative scale. We are going to develop the efficient strategies applicable to sparse derivatives of different order. Finally, we will develop accelerated methods for huge-scale problems. In our approach, the main source for improvements will be the proper use of problem structure.

Our goal is to solve the important problems, which are looking unsolvable now. Theoretical development of Optimization must reach the state, when there is no gap between theory and practice: the theoretically efficient methods must considerably outperform any heuristics. Optimization becomes more and more important. Many new applications of Artificial Intelligence like Neural Networks, Deep Learning, Image Processing, and others are using intensively new optimization methods. The progress in these fiends is strongly dependent on the new abilities provided by the new optimization methods.

We managed to find two strong Ph-D students and hire five very active postdoctoral researchers. During these 30 months, we organized an efficient collaboration of young researchers, advancing significantly in the topics related to development of accelerated methods, independently of their order. Our field of interest at that time was the theory of 2nd-order schemes. In Optimization, the development of corresponding theory was delayed for decades due to the absence of a convenient framework for global complexity analysis. We extend our approach to quasi-Newton methods, needed no information on the 2nd derivatives. We looked also at the tensor methods, investigating their accelerated convergence both in convex and nonconvex cases. For convex problems, we get additional acceleration based on the approach of estimating functions.

Our research results are reflected in 18 full-length papers, published in the high-level peer-reviewed journals. The most interesting results are as follows.

Paper [6]: New complexity analysis of tensor methods with weaker assumptions on the smoothness of derivatives.

Paper [4]: We proved the global super-linear convergence of quasi-Newton method.

Paper [9]: We developed a new approach based on contracting the space of variables, explaining acceleration phenomena of all tensor methods.

Paper [10], the first justification of the local super-linear convergence of tensor methods, based on new nondegeneracy conditions.

Paper [2], the first justification of implementable third-order method, which is able to solve real-life problems.

Our research results are reflected in 18 full-length papers, published in the high-level peer-reviewed journals. The most interesting results are as follows.

Paper [6]: New complexity analysis of tensor methods with weaker assumptions on the smoothness of derivatives.

Paper [4]: We proved the global super-linear convergence of quasi-Newton method.

Paper [9]: We developed a new approach based on contracting the space of variables, explaining acceleration phenomena of all tensor methods.

Paper [10], the first justification of the local super-linear convergence of tensor methods, based on new nondegeneracy conditions.

Paper [2], the first justification of implementable third-order method, which is able to solve real-life problems.

We advanced in several research directions. For 2nd-order methods, we studied their performance on the class of uniformly convex functions [11] and developed a general framework, based on contractions. It allows acceleration of all proximal schemes [9]. For tensor methods, we studied their behavior for functions with relaxed smoothness conditions [6] with inexact solutions of the auxiliary problem [7]. We justified their local rate of convergence [10] and studied their efficiency for finding stationary points [8]. For quasi-Newton methods, we got the first global complexity results and qualified their local rate of convergence [3, 4, 5]. We developed the first implementable third-order scheme with better complexity characteristics than the 2nd-order methods.

For future developments, we got a general two-level framework BLUM for constructing optimization schemes. At the 1st level, we choose a specific high-order proximal method with very high rate of convergence. This method is very abstract and its justification does not need any assumptions on the properties of objective function. For implementing its iteration, we apply a lower-level method, based on the properties of derivatives of certain degree. This allows us to use the lower-level methods dor the problem classes, which are traditionally attributed to the high-order schemes. Thus, we can construct methods with convergence rate going above the existing theoretical limits. Hence, we change completely our understanding the abilities of optimization methods. This is a beginning of a long process. Our hopes are confirmed by second-order methods, which convergence overpass the theoretical limits (by a slight change of the problem class).

At this moment, this approach is described in three papers, submitted to the journals. Its proper implementation gives us a possibility for a significant advancement in all research directions of the project.

For future developments, we got a general two-level framework BLUM for constructing optimization schemes. At the 1st level, we choose a specific high-order proximal method with very high rate of convergence. This method is very abstract and its justification does not need any assumptions on the properties of objective function. For implementing its iteration, we apply a lower-level method, based on the properties of derivatives of certain degree. This allows us to use the lower-level methods dor the problem classes, which are traditionally attributed to the high-order schemes. Thus, we can construct methods with convergence rate going above the existing theoretical limits. Hence, we change completely our understanding the abilities of optimization methods. This is a beginning of a long process. Our hopes are confirmed by second-order methods, which convergence overpass the theoretical limits (by a slight change of the problem class).

At this moment, this approach is described in three papers, submitted to the journals. Its proper implementation gives us a possibility for a significant advancement in all research directions of the project.