European Commission logo
français français
CORDIS - Résultats de la recherche de l’UE

Numerical Optimal tRansport for ImAging

Periodic Reporting for period 4 - NORIA (Numerical Optimal tRansport for ImAging)

Période du rapport: 2022-04-01 au 2023-09-30

The NORIA project is dedicated to the development of innovative computational and theoretical approaches in machine learning, with a focus on leveraging Optimal Transport (OT) theory. A key challenge in advancing the frontier of research in deep network generative models lies in the ability to statistically compare high-dimensional distributions, such as the training datasets and the synthetic data ("deep fakes") generated by these models. This capability is essential for achieving high-quality generation of images, videos, and texts, as well as for controlling potential biases during the generation process. The scaling of Optimal Transport to high-dimensional learning represents a significant area of research, as it offers the most effective means of attaining these critical objectives. In pursuit of this goal, the NORIA project has developed novel Optimal Transport solvers that exploit sparsity and low-rank structures, enabling the scalability of OT to high-dimensional problems in machine learning. On the practical side, a major accomplishment of the project is the application of these methods to advance the state of the art in single-cell genomics. Specifically, NORIA has created the Mowgli Python package, which utilizes OT for the processing and clustering of high-dimensional single-cell data.
The first phase of the NORIA project has developed stochastic methods to scale optimal transport to high-dimensional learning. This corresponds to improving existing methods by orders of magnitude, aiming to leverage GPU computational architectures and enhance statistical efficiency in high dimensions. The two core ideas developed by the team to achieve these goals are: (i) utilizing entropic regularization of optimal transport problems, and (ii) introducing online stochastic optimization methods capable of handling streams of samples. In the second phase, these methods were extended to incorporate key features of large-scale real datasets, particularly focusing on sparsity and low-rank structures. During the final phase, NORIA expanded these methods to address more complex non-convex OT problems, such as Gromov-Wasserstein. Additionally, the project adapted and extended these techniques to suit single-cell genomics problems. This culminated in the release of the open-source Python package Mowgli, which surpasses the state-of-the-art for the analysis of single-cell data.
Our main theoretical findings include the first tractable approach to computing optimal transport distances in high dimension. We have obtained sharp quantitative estimates both in term of computational time and sample complexity. Our main numerical contributions correspond to new ways to represent transportation plans, using kernel expansions of the associated potentials. These two classes of results work hand in hand, and are implemented in efficient Python numerical codes, which can be used in conjunction with deep learning pipelines to train generative models.