Foundations of Geometric Statistics and Their Application in the Life Sciences

Informations projet

G-Statistics

N° de convention de subvention: 786854

Site Web du projet

DOI

10.3030/786854

Projet clôturé

Date de signature de la CE 16 Mai 2018

Date de début 1 Septembre 2018

Date de fin 31 Août 2024

Financé au titre de

EXCELLENT SCIENCE - European Research Council (ERC)

Coût total

€ 2 183 583,75

Contribution de l’UE

€ 2 183 583,75

2 183 583,75

Coordonné par

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET AUTOMATIQUE
France

Periodic Reporting for period 4 - G-Statistics (Foundations of Geometric Statistics and Their Application in the Life Sciences)

Période du rapport: 2023-03-01 au 2024-08-31

Geometry is a foundational mathematical aspect of many theories in physics. In domains such as life sciences the elementary laws are less obvious and need to be established statistically. Many different invariances can be considered for that, and one of the goals is to identify the most convenient geometry for the analyzed data. However, non-linearities due to the geometry are often neglected, and statistical estimation is most often performed in a Euclidean space, thus hiding the potentially effects of non-linearities and singularities.

The goal of geometric statistics is to develop a rigorous statistical theory on manifolds and more generally on spaces with a geometric structure. This project aimed at strengthening its mathematical foundations and at exemplifying their impact on selected applications in the life sciences. We explored in G-statistics foundational methods to unify statistical estimation theories on Riemannian manifolds with other geometric structures like Lie groups, affine connection spaces, quotient and stratified spaces that naturally arise in applications. Beyond the mathematical theory, we aimed at providing generic but effective implementations of most of the geometric statistics methods that can use specific implementation of most of the geometric structure considered. We illustrate our methods in computational anatomy application with the study of anatomical shapes and the forecast of their evolution.

The methodological foundations of statistics on Riemannian manifolds were formalized and disseminated in the reference book “Riemannian Geometric Statistics in Medical Image Analysis” [7]. We notably extended the methodology to: affine connection spaces in the context of Lie groups [15] with the canonical bi-invariant Cartan-Schouten connection, and the statistical estimation in quotient spaces [46]. In order to understand the impact of curvature, we developed in [40] new coordinate free and tensorial Taylor expansions which provide polynomial approximations of geodesic triangles at any order. A first major outcome was the analysis of the numerical accuracy of the discrete ladder algorithms for parallel transport on manifolds, with exact or approximated geodesics [12]. A second major result was the measure of the impact of manifold curvature on the estimation of the empirical Fréchet mean. We show an unexpected bias of the empirical mean in 1/n and a modulation of the convergence rate of the covariance matrix proportional to the covariance-curvature tensor. These results unveil an intermediate behavior of the empirical mean in manifolds between stickiness and smeariness: one generally needs more samples in a positively curved manifold (and respectively less samples in negative curvature) than in a Euclidean space to estimate a quantity up to a certain uncertainty. Finally, the empirical versus population estimation of summary statistics such as the Fréchet p-mean mean was rephrased as a geometric projection in a Wasserstein space [39] and generalized to polymeans (k-means algorithm). This geometrization of statistics allowed to generalize some of the asymptotic properties to more general geometric structures such as length spaces

Beyond the mean, we investigated non-parametric submanifold learning techniques generalizing properly the principal flows to more than one dimension. The main obstruction is that the tangent space estimated with local PCA does not generates a submanifold but rather a non-integrable field of subspaces (a geometric distribution) that we call the Principal Bundle [32,37]. Despite the absence of a submanifold, we can still compute distances between the points of the underlying point-cloud that respect this geometry using the proper notion of sub-Riemannian geodesics. This method working in any manifold and any dimension / co-dimension achieves impressive results on very noisy point clouds on a 2D surface in 3D. This is a very promising technique for geometric processing in computer graphics and for data analysis in high dimensional spaces. We also developed a new theory of affine maps in manifolds which pave the way for the generalization of algorithms like Locally Linear Embedding (LLE) to Riemannian manifolds [31,49]. Finally, we revisited standard dimension reduction techniques such as probabilistic PCA with flag spaces: we showed that the resulting Principal Subspace Analysis provides a principled family of models which is much simpler and more interpretable than usual PCA modes, while remaining as efficient as other the state-of-the-art methods [51].

For symmetric positive definite (SPD) matrices, used in a wide range of applications, we clarified the relationship between existing metrics by classifying them in main families based on their invariance properties [1,3,27,29]. We then investigated the quotient space of full-rank correlation matrices. The most natural affine-quotient metric has both negative and (unbounded) positive curvature [18], which may notably complexify the implementation of the logarithm with optimization. Thus, we introduce computationally more convenient Hadamard or even log-Euclidean metrics, along with their geometric operations [28,45]. These new metrics may have very interesting applications in several areas, notably in neuroimaging where brain networks extracted from fMRI data are parametrized by correlation matrices.

From the technological point of view, we have contributed to develop the python package geomstats (https://geomstats.github.io/) a generic library for statistical computing algorithms on different geometric structures [11,13]. The package currently supports more than 15 manifolds with closed-form geodesics (when known) or discrete geodesics obtained by optimization otherwise. This package encompasses complex notions of Riemannian geometry embedded into a consistent object-oriented API that makes it readable and editable by mathematicians. From an applied perspective, the use of the algorithms does not require a deep understanding of the mathematics and is made easy to use for artificial intelligence thanks to a standard Scikit learn interface. This library is now widely used by many people in geometric statistics but also by many people in data science, and raises a growing interest in the machine learning community as demonstrated by the publication of the monograph [22] in the series Foundations & Trends in Machine Learning.

We illustrated our methods in real-world computational anatomy applications with the statistical modeling of cardiac motion across subjects. The geodesic regression of the motion of the heart in a group of diffeomorphisms was parallel transported along the inter-subject deformation in order to perform groups statistics on all trajectories in the same reference anatomy. For the right ventricle under pressure or volume overload, decoupling the volume change from the deformation directly within the metric on diffeomorphism revealed statistical insights into the dynamics of each disease [16,23,25]. A similar methodology using Cartan-Schouten instead of a right-invariant metric connections on diffeomorphisms was applied to the assessments of treatment effects on longitudinal brain changes in the Multidomain Alzheimer Preventive Trial cohort [10].

Statistics on diffeomorphisms to model the evolution of the brain in Alzheimer's disease

Discrete Schild's and pole ladders schemes for parallel transport on manifolds

Periodic Reporting for period 4 - G-Statistics (Foundations of Geometric Statistics and Their Application in the Life Sciences)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page