Graphical Models for Complex Multivariate Data

Project Information

GRAPHMODE

Grant agreement ID: 883818

DOI

10.3030/883818

EC signature date 27 April 2020

Start date 1 October 2020

End date 30 September 2026

Funded under

EXCELLENT SCIENCE - European Research Council (ERC)

Total cost

€ 1 971 296,00

EU contribution

€ 1 971 296,00

1 971 296,00

Coordinated by

TECHNISCHE UNIVERSITAET MUENCHEN
Germany

Periodic Reporting for period 3 - GRAPHMODE (Graphical Models for Complex Multivariate Data)

Reporting period: 2023-10-01 to 2025-03-31

Modern scientific experiments frequently produce simultaneous measurements of a larger collection of variables with the goal of shedding light on the structure of a complex system. This project develops statistical methods for analyzing patterns of dependence in such multivariate datasets. The types of dependence considered in the project result from cause-effect relations among the considered variables. The key issue being addressed is how to learn cause-effect relations in settings with imperfect data. In particular, the project considers problems in which key variables remain latent (i.e. unobservable/unobserved), problems that lack temporal resolution in studies of feedback loops, or problems that feature only limited experimental interventions. The new methodological advances are being developed in the framework of probabilistic graphical models, which use mathematical graphs and networks to represent different types of stochastic dependence.

The statistical models and inferential methods developed and analyzed in the project are useful for a number of different applications. These range from inference of regulatory networks in (computational) biology on the basis of gene expression data to observational patient studies in psychology. The new techniques allow one to determine fundamental limits of what can be learned from imperfect data and provide concrete tools to empirically explore interactions and cause-effect relations between larger sets of variables. The tools support, in particular, generation of scientific hypotheses.

Classical work in the area of graphical models relies heavily on the statistical assessement of conditional indendences between the studied variables. A main objective of the project is to advance the area by studying and statistically exploiting more general types of constraints on the variables' joint probability distribution. Alongside theoretical exploration of such constraints, the project seeks to develop concrete improved methods for testing hypotheses that have been formulated in terms of such constraints and to design new methods for learning causal models from data.

The following are highlights of the work performed so far.

In the area of parameter identifiability, the team has developed new methods to decide whether the parameters of a given graphical model are identifiable (estimable) when the model features latent variables and/or feedback loops. In contrast to prior work that exploits latent independence relations the new criteria leverage latent low-rank structure.

For linear causal models, we determined novel algebraic relations among the low-order moments. These moment relations have been used to design new algorithms for causal discovery for models with non-Gaussian or homoscedastic noise. Additionally, the new results yield new insights on when different networks of cause-effect relations can be distinguished empirically in settings where not all of the relevant variables can be observed.

The existing theory and methodology for graphical models is developed primarily for recursive systems, i.e. systems that are free of feedback loops. As a novel approach that are more easily accommodates the presence of feedback loops, we are studing new classes of graphical models that are derived from dynamical processes. Our work clarifies identifiability of such models and investigates approaches to learn the models from data that lack temporal resolution.

Data-driven science often faces the difficulty that when investigating a hypothesized causal effect not only the numerical effect but also the causal model/structure has be estimated. We developed prototype methods that allow one to rigorously account for uncertainty about the direction and specific nature of cause-effect relations in statistical inference about the causal effect of one variable on another.

In causal discovery, causal directions are often determined by assessing by predicting a variable from its putative causes and assessing statistically whether the prediction errors are statistically independent of the causes. To support this approach, we worked on new approaches to formulate measures of dependence that can consistently detect non-linear dependences among random vectors, and we showed how to leverage them in formal tests of independence.

Our work advances the field of graphical models via new insights on how the models constrain the joint distribution of observed variables. The work leverages these new insights in algorithms to decide identifiability of parameters and graphical structure underlying different types of models, and in algorithms for structure learning and estimation of causal effects. Moreover, the work advances new types of models that promise to better accommodate data from systems with feedback loops.

We expect a series of further results that, in particular, generalize our identifiability criteria, improve our ability to learn feedback mechanisms, broaden the scope of applicability of our methods for uncertainty quantification, tackle non-linear causal models, and open up new ways to perform goodness-of-fit tests for graphical models.

Learning the structure of a graphical model

Periodic Reporting for period 3 - GRAPHMODE (Graphical Models for Complex Multivariate Data)

Share this page Share this page on social networks

Download Download the content of the page