Robust Causal Discovery

Informations projet

ROCDISCO

N° de convention de subvention: 101103017

DOI

10.3030/101103017

Projet clôturé le 9 Février 2024

Date de signature de la CE 20 Mars 2023

Date de début 1 Juin 2023

Date de fin 31 Mai 2025

Financé au titre de

Marie Skłodowska-Curie Actions (MSCA)

Coût total

Aucune donnée

Contribution de l’UE

€ 203 464,32

Coordonné par

UNIVERSITEIT MAASTRICHT
Netherlands

Periodic Reporting for period 1 - ROCDISCO (Robust Causal Discovery)

Période du rapport: 2023-06-01 au 2025-05-31

Due to technological advances, the available amount of data has increased tremendously over the last decade. The fields of data science, statistics, computer science and econometrics have followed this growth as they provide indispensable tools for translating data into insights and knowledge. Where data science was traditionally concerned with learning associations in data, it has recently become clear that causal relations often provide a deeper understanding and a stronger tool in many practical applications. This has led to the flourishing of causal inference with some of the most prestigious scientific awards going to pioneers in the field over the last decade.

“Can we learn causal mechanisms from observational data?” is one of the compelling questions that is occupying scientists all over the world. Where it was originally answered by skepticism, it has become clear that we are not completely powerless and there are indeed ways to infer causal structure from observational data under the right conditions. However, all of the current methods assume that the observed data perfectly follows the underlying causal structure. Unfortunately, real world data is often contaminated by anomalies and measurement errors, violating this assumption and thus weakening the reliability of methods for causal discovery.

This proposal aims to fill this gap by developing methods for causal discovery that remain efficient and reliable even when the data is not so well-behaved. The strong methodological focus aims to advance the theoretical and empirical understanding of causal discovery, as well provide a versatile toolbox to support scientists doing causal discovery to improve the reliability of their findings.

The project’s overall goal is to provide tools for robust causal discovery. Two specific research objectives have been tackled:

1. Robust causal discovery in the LiNGAM model. The project has developed a new algorithm called TSLiNGAM which is designed to estimate LiNGAM structures on data containing extreme observations (Leyder, Raymaekers, Verdonck (2023)). TSLiNGAM probably identifies the LiNGAM structure, and is more robust to extreme observations than existing alternatives. Additionally, TSLiNGAM also outperforms the competition when the data contains many skewed variables.

2. Robust measurements of independence. The project has developed a novel approach to robustly measuring (in)dependence between variables, called the biloop distance correlation (Leyder, Raymaekers, Rousseeuw (2024)). Measuring independence is a cornerstone of causal discovery, in addition to being useful in other applications as well. The biloop distance correlation is a measure of dependence (i.e. it is zero if and only if the variables are independent), which has a continuously redescending influence function. This is achieved by mapping the input variables into a higher-dimensional space, in which it is possible to jointly achieve these properties.

Leyder, S., Raymaekers, J. and Rousseeuw, P.J. (2024), “Is Distance Correlation Robust?”, Arxiv preprint 2403.03722 https://arxiv.org/abs/2403.03722 .
Leyder, S., Raymaekers, J. and Verdonck, T. (2023), “TSLiNGAM: DirectLiNGAM under heavy tails”, Arxiv preprint 2308.05422 https://arxiv.org/abs/2308.05422 .

Causal discovery lies at the heart of many scientific disciplines, and the use of methods for causal discovery from observational data is gaining in popularity. Therefore, this project has the potential to generate a broad impact reaching far outside of the foundational disciplines of statistics, economics and computer science, in disciplines including epidemiology, omics, physics, chemometrics, and economic policy.

robustness in LiNGAM models: TSLingam performs reliably under contamination

Periodic Reporting for period 1 - ROCDISCO (Robust Causal Discovery)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page