Constrained Low-Rank Matrix Approximations: Theoretical and Algorithmic Developments for Practitioners

Periodic Reporting for period 4 - COLORAMAP (Constrained Low-Rank Matrix Approximations: Theoretical and Algorithmic Developments for Practitioners)

Okres sprawozdawczy: 2021-03-01 do 2021-08-31

What is the problem/issue being addressed?
This project focuses on the following class of constrained low-rank matrix approximation (CLRMA) problems: Given an m-by-n matrix M and a factorization rank r
Why is it important for society?
CLRMA are very flexible models and can be used for many problems; in particular in data analysis, e.g. background substraction in a video sequence, recommender systems that predict the preferences of users for some items based on their preferences for other items, image segmentation, document classification, audio source separation (and more generally, blind source separation). The class of CLRMA problems contain for example PCA, k-means, subspace clustering, NMF, ICA, to cite a few.

What are the overall objectives?
This project has four main building blocks:
(1) 'Computational complexity' whose goal is to understand CLRMA problems better: can we solve these problems? under which conditions?
(2) 'Provably correct algorithms' whose goal is to design algorithms that provable recover the optimal solution. In fact, CLRMA problems are in general difficult but, under some assumptions, some of them can be solved efficiently.
(3) 'Heuristics' whose goal is to design algorithms (without optimality guarantees) for difficult instances.
(4) 'Applications' whose goal is to apply the developed algorithms on real-word problems such as the ones mentioned above.

Let us list our main results following our four main building blocks:
(1) Computational complexity: we proved NP-hardness of CLRMA with l1 and linfinity norms which are two important variants. We were also able to understand sparse CLRMA problems better using geometric intuition and characterize situations where the optimal solution is unique (up to permutation and scaling).
(2) Provably correct algorithms: We have designed a fast and provable correct algorithm for NMF under the separability condition. We have also developed algorithms for minimum-volume NMF that allows to solve NMF under weaker assumptions than the separability condition. We have also developed a provably correct algorithms for (i) simplex-structured matrix factorization, (ii) linear-quadratic NMF, and (iii) Convolutive NMF.
(3) Heuristics: We have developed several heuristic algorithms for the following CLRMA problems: NMF (using momentum/extrapolation), subspace clustering (using sampling and multilayer graphs), positive semidefinite factorization (that are particularly useful for extended formulations), sparse component analysis, dictionary-based factorizations, robust symmetric NMF, NMF based on beta-divergence with equality constraints, . More recently, we have developed general optimization tools that can be used to solve CLRMA. They are based on first-order and block coordinate descent methods with momentum, and are guaranteed to converge to stationary point.
Moreover, we have came up with a new application of CLRMA. We proposed a completely new model using factorization of stable matrices; namely A = (J-R)Q where J is antisymmetric (J^T=-J), R and Q are positive semidefinite. All previous work used directly the matrix A, trying to control its eigenvalues directly. With our formulation, a matrix is stable if and only if it admits a factorization of the form (J-R)Q with the above mentioned constraints. This work lead to many other results; e.g. for matrix pairs and positive-real systems.
(4) Applications: For all the algorithms mentioned above, we have used them on real-world applications; with a focus on document classification, audio source separation and hyperspectral unmixing. We have also developed some new models particularly well suited for applications: distributionally robust NMF, generalized separable NMF, and off-diagonal symmetric NMF, for which we have also proposed efficient algorithms.

*Progress beyond the state of the art*
For the NP-hardness proofs, we reduced known NP-hard problems to ours as done in the literature. However, to achieve this goal, we had to come up with new and nontrivial tricks. Our provably correct algorithms for linear-quadratic NMF, separable NMF, convolutive NMF, generalized separable, minimum-volume NMF and simplex-structured MF were shown to outperform the state-of-the-art. Similarly, the heuristic algorithms we have developed always compared favourably with the state of the art. Also, our new CLRMA models (including the one for finding the nearest stable matrix to an unstable one, distributionally robust NMF, off-diagonal symmetric NMF, and generalized separable NMF) outperformed the state of the art. This translated into better performance for the considered real-world applications.

*Expected results until the end of the project*
Currently we are focusing on two aspects:
(1) provably correct algorithms: we are looking into such algorithms for linear and nonlinear subspace clustering, NMF for which we study ways to characterize the key sufficiently scattered condition, and a tri-factorization model.
(2) heuristic algorithms: we are developing (i) a robust NMF model based on the l1 norm for sparse data set, (ii) more efficient general-purpose algorithms using Bregman distances, (iii) sparse nonnegative least squares, and (iv) minimum-volume NMF.
(3) applications: we are particularly looking into hyperspectral unmixing, and the refined linear-quadratic factorization model, document classification and recommender systems for which we try to develop a more interpretable model.
These results have been obtained in the last months of the project; see "Major Achievements".

Applications covered by CLRMA studied in this project

Periodic Reporting for period 4 - COLORAMAP (Constrained Low-Rank Matrix Approximations: Theoretical and Algorithmic Developments for Practitioners)

Udostępnij tę stronę

Pobierz