Periodic Reporting for period 2 - NORIA (Numerical Optimal tRansport for ImAging)
Reporting period: 2019-04-01 to 2020-09-30
"The NORIA project develops new computational and theoretical methods for machine learning. These advances are based on the theory of Optimal Transport. Advancing the front of research in deep network generative models requires tools to compare in a statistically meaningful way high dimensional distributions (the training data sets and the ""deep fakes"" generated by the models). This is crucial to achieve high quality generation of images, videos and texts. This is also important in order to control possible bias during the generation process. Scaling Optimal Transport to high dimensional learning is thus a major topic of research because it is the best way to achieve these important goals."
Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far
The first phase of the NORIA project has developed stochastic methods to scale optimal transport to high dimensional learning. This corresponds to improving by orders of magnitude existing methods, in order to take benefit of GPU computational architectures, and also in term of statistical efficiency in high dimension. The two core ideas developed by the team to achieves these goals are (i) using an entropic regularization of optimal transport problems (ii) introducing online stochastic optimization methods which can cope with streams of samples.
Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)
Our main theoretical findings include the first tractable approach to computing optimal transport distances in high dimension. We have obtained sharp quantitative estimates both in term of computational time and sample complexity. Our main numerical contributions correspond to new ways to represent transportation plans, using kernel expansions of the associated potentials. These two classes of results work hand in hand, and are implemented in efficient Python numerical codes, which can be used in conjunction with deep learning pipelines to train generative models.