Periodic Reporting for period 1 - ROCDISCO (Robust Causal Discovery)
Período documentado: 2023-06-01 hasta 2025-05-31
“Can we learn causal mechanisms from observational data?” is one of the compelling questions that is occupying scientists all over the world. Where it was originally answered by skepticism, it has become clear that we are not completely powerless and there are indeed ways to infer causal structure from observational data under the right conditions. However, all of the current methods assume that the observed data perfectly follows the underlying causal structure. Unfortunately, real world data is often contaminated by anomalies and measurement errors, violating this assumption and thus weakening the reliability of methods for causal discovery.
This proposal aims to fill this gap by developing methods for causal discovery that remain efficient and reliable even when the data is not so well-behaved. The strong methodological focus aims to advance the theoretical and empirical understanding of causal discovery, as well provide a versatile toolbox to support scientists doing causal discovery to improve the reliability of their findings.
1. Robust causal discovery in the LiNGAM model. The project has developed a new algorithm called TSLiNGAM which is designed to estimate LiNGAM structures on data containing extreme observations (Leyder, Raymaekers, Verdonck (2023)). TSLiNGAM probably identifies the LiNGAM structure, and is more robust to extreme observations than existing alternatives. Additionally, TSLiNGAM also outperforms the competition when the data contains many skewed variables.
2. Robust measurements of independence. The project has developed a novel approach to robustly measuring (in)dependence between variables, called the biloop distance correlation (Leyder, Raymaekers, Rousseeuw (2024)). Measuring independence is a cornerstone of causal discovery, in addition to being useful in other applications as well. The biloop distance correlation is a measure of dependence (i.e. it is zero if and only if the variables are independent), which has a continuously redescending influence function. This is achieved by mapping the input variables into a higher-dimensional space, in which it is possible to jointly achieve these properties.
Leyder, S., Raymaekers, J. and Rousseeuw, P.J. (2024), “Is Distance Correlation Robust?”, Arxiv preprint 2403.03722 https://arxiv.org/abs/2403.03722(se abrirá en una nueva ventana) .
Leyder, S., Raymaekers, J. and Verdonck, T. (2023), “TSLiNGAM: DirectLiNGAM under heavy tails”, Arxiv preprint 2308.05422 https://arxiv.org/abs/2308.05422(se abrirá en una nueva ventana) .