Periodic Reporting for period 4 - ACCOPT (ACelerated COnvex OPTimization)
Reporting period: 2023-03-01 to 2024-08-31
Optimization becomes more and more important. Many new applications of Artificial Intelligence like Neural Networks, Deep Learning, Image Processing, and others are using intensively new optimization methods. The progress in these fiends strongly depends on the abilities of the optimization methods.
The most important result of our project is a development of new framework, which is based on a new definition of the order of optimization schemes. This opens a very interesting possibility for applying the lower-order methods to the problem classes traditionally related with the higher-order schemes. The further development of this line of research will give us much more efficient and cheap optimization methods.
In [2] we make the first but very essential step in constructing the implementable tensor methods. This problem was open during decades. We prove that a proper regularization of Taylor polynomial is a convex function. This opens a possibility of using the higher-order methods in practical computations.
In [3], we propose a new quasi-Newton scheme admitting a global worse-case superlinear rate of convergence. This is the first result of this type in the literature. In [4, 5], a similar result is proved for the classical quasi-Newton methods.
In [6], we present a new complexity analysis of tensor methods with weaker assumptions on the smoothness of derivatives. And in [7], we analyze the behavior of tensor methods in the inexact situation. This is typical for practical implementations.
In [8], we analyze the performance of high-order tensor methods on convex functions, trying to reducer the norm of the gradient.
In [9], we justify a new explanation of the acceleration phenomena for the first-order methods using a contracting operation. This helps in constructing high-order accelerated methods too.
In [10], we prove the local superlinear rate of convergence of tensor methods under some natural non-degeneracy assumptions. And In [11], we analyze the interaction of the Cubic Regularization technique with uniform convexity. We show that the global rate of convergence becomes linear for the convexity degree of three. In [12], for high-order tensor methods, we suggest a dynamic strategy for choosing the accuracy of the auxiliary problem.
In [13], we analyze a Newton Method generating test poins in a small-dimensional stochastic subspace. Thus, the complexity of each iteration becomes very small.
In [14], we analyze a new class of functions with global lower bound for its growth in terms of the second derivatives. We show how this can be used for accelerating the second-order methods.
In [17], we propose a first approach to clustering of multi-dimensional arrays of features by a convex optimization technique. We interpret soft clustering as a result of elections in democratic countries. After each round of election, the party updates it position in accordance to the opinions of the attracted voters. The sequence of these elections can be seen as an alternating minimization methods with global linear rate of convergence to a unique stationary point.
In [18], which is written for general public, we explain the goals and the tools used in our project, and discuss also the most motivating applications.
In [21], we the show that the contracting technique of [9] can be implemented in an affine-invariant way. Hence. the complexity bounds for corresponding methods do not depend on the choice of coordinates.
In [22] we introduce the high-order two-level proximal methods. The methods of the upper level rely on a computations a proximal-point operator. Its order depends on our abilities to compute its approximation in a reasonable computational. In this way, we can apply lower-order methods to the problem classes traditionally attributed to the high-order schemes. In [23] we complement the technique of [23] by an auxiliary line-search procedure.
In [24], we show that all third-order optimization methods can be implemented using only the second-order derivatives. This result eliminates the pure third-order methods from the computational practice. In [25], we analyze a second-order method, where the coefficient of the cubic proximal term is chosen proportionally to the norm of the gradient. For the functions with Lipschitz-continuous third derivative this method has the same rate of convergence as the basic third-order method.
In [26] we adapt the powerful polynomial-time interior-point methods to the modern problems of Machine Learning and Artificial Intelligence. We introduce a new class od set-limited functions, which help in constructing the barriers for the epigraphs of convex functions. This allows us to travel from our starting point directly to the optimal solution of our problem.
In [27], we suggest a combination of sub-gradient and Ellipsoid methods for minimizing non-smooth convex functions. Its complexity continuously depends on the dimension of the problem. And in [28], we introduce the Fully Composite problems and propose for them several new methods based on contraction of the primal space.
In [29], we study different adaptive strategies for choosing parameters of problem classes containing our objective function. These parameters are important for the convergence rate of high-order methods.
In [30], we justify the first method with provable complexity bounds for all tractable problem classes of convex problems. For all these classes, it automatically ensures the correct global rate of convergence.
1. Acceleration phenomena is explained now by a more general contraction technique [21]. For quasi-Newton methods we can justify now their global performance, including the superlinear rate [3,4].
2. For the second-order methods, we proved a very unexpected result claiming that all third-order methods can be implemented by the second-order methods [24], keeping the global rate of convergence unchanged [25].
3. A fundamental breakthrough is the development of a super-universal second-order scheme, working properly on all problem classes admitting the worst-case complexity bounds.
4. The fundamental result of [2] demonstrates that the high-order optimization methods can be treated Convex Optimization technique. We analyzed performance of the higher-order methods in different situations, including inaccuracy in the auxiliary problem [7,12,16].
5. Our final fundamental contribution is the new definition of the order of the method, defined by the order of a high-order proximal-point operator, underlying the optimization scheme [22,23]. Hence, the lower-order methods can be used for the classes traditionally attributed to the higher-order schemes.