Periodic Reporting for period 3 - GRAPHMODE (Graphical Models for Complex Multivariate Data)
Reporting period: 2023-10-01 to 2025-03-31
The statistical models and inferential methods developed and analyzed in the project are useful for a number of different applications. These range from inference of regulatory networks in (computational) biology on the basis of gene expression data to observational patient studies in psychology. The new techniques allow one to determine fundamental limits of what can be learned from imperfect data and provide concrete tools to empirically explore interactions and cause-effect relations between larger sets of variables. The tools support, in particular, generation of scientific hypotheses.
Classical work in the area of graphical models relies heavily on the statistical assessement of conditional indendences between the studied variables. A main objective of the project is to advance the area by studying and statistically exploiting more general types of constraints on the variables' joint probability distribution. Alongside theoretical exploration of such constraints, the project seeks to develop concrete improved methods for testing hypotheses that have been formulated in terms of such constraints and to design new methods for learning causal models from data.
In the area of parameter identifiability, the team has developed new methods to decide whether the parameters of a given graphical model are identifiable (estimable) when the model features latent variables and/or feedback loops. In contrast to prior work that exploits latent independence relations the new criteria leverage latent low-rank structure.
For linear causal models, we determined novel algebraic relations among the low-order moments. These moment relations have been used to design new algorithms for causal discovery for models with non-Gaussian or homoscedastic noise. Additionally, the new results yield new insights on when different networks of cause-effect relations can be distinguished empirically in settings where not all of the relevant variables can be observed.
The existing theory and methodology for graphical models is developed primarily for recursive systems, i.e. systems that are free of feedback loops. As a novel approach that are more easily accommodates the presence of feedback loops, we are studing new classes of graphical models that are derived from dynamical processes. Our work clarifies identifiability of such models and investigates approaches to learn the models from data that lack temporal resolution.
Data-driven science often faces the difficulty that when investigating a hypothesized causal effect not only the numerical effect but also the causal model/structure has be estimated. We developed prototype methods that allow one to rigorously account for uncertainty about the direction and specific nature of cause-effect relations in statistical inference about the causal effect of one variable on another.
In causal discovery, causal directions are often determined by assessing by predicting a variable from its putative causes and assessing statistically whether the prediction errors are statistically independent of the causes. To support this approach, we worked on new approaches to formulate measures of dependence that can consistently detect non-linear dependences among random vectors, and we showed how to leverage them in formal tests of independence.
We expect a series of further results that, in particular, generalize our identifiability criteria, improve our ability to learn feedback mechanisms, broaden the scope of applicability of our methods for uncertainty quantification, tackle non-linear causal models, and open up new ways to perform goodness-of-fit tests for graphical models.