Skip to main content

Causality Relations Using Nonlinear Data Assimilation

Periodic Reporting for period 3 - CUNDA (Causality Relations Using Nonlinear Data Assimilation)

Reporting period: 2019-09-01 to 2021-02-28

A major problem in understanding complex nonlinear geophysical systems is to determine which processes drive which other processes, so what the causal relations are. Several methods to infer nonlinear causal relations exist, but often lead to dierent answers, often perform hypothesis testing on causality, need long stationary time series, can be misleading if an unknown process drives the processes under study, or, if a numerical model is used, reflect model causality instead of real-world causality. Furthermore methods that use the governing evolution equations directly lead to intractable high-dimensional integrals.

In this proposal we tackle these problems by rstly embedding causality into a Bayesian framework, moving from testing causality to estimating causality strength and its uncertainty in a systematic way. Knowledge from several causality methods can be combined, new knowledge can be brought in systematically, and time series can be short. Furthermore, new knowledge can be incorporated into the existing knowledge basis, and several methods can be combined in a consistent manner. Secondly, a new formulation to infer causal strength exploring evolution equations that avoids high-dimensional integrals will be explored. Thirdly, numerical models are combined with observations by exploring fully nonlinear data assimilation to study real-world causality.

We will test the new techniques on simple models and then apply them to a high-resolution model of the ocean area around South Africa where the Southern Ocean, the Indian Ocean, and the Atlantic Ocean meet. This area plays a crucial role in the global circulation of heat and salt by bringing warm and salty Indian Ocean water into the Atlantic in a highly turbulent manner. The techniques allow to infer what sets this interocean transport, the turbulent local dynamicsor the global climate-related dynamics, crucial for understanding the functioning of the ocean in the climate system.

The main objective is to generate robust methods to determine causality strength in complex highly nonlinear high-dimensional geophysical systems, and use this to a solve the interocean exchange causality problem in the ocean area south of Africa.

This proposal will do this by first defining a new quantity called causality strength to be able to unify different approaches, by employing a Bayesian framework, and by using nonlinear data assimilation. This leads to the following specific objectives:
1) Generate a Bayesian framework for causality strength
2) Explore model equations in causality measures and explore high-dimensionality issues
3) Explore synchronisation and large deviation theory for nonlinear smoothers
4) Explore optimal transportation for nonlinear smoothers.
5) Combine all knowledge and determine the causality chain of the ocean problem.
We have made the following progress against the 5 objectives:
1) Generate a Bayesian framework for causality strength
2) Explore model equations in causality measures and explore high-dimensionality issues
3) Explore synchronisation and large deviation theory for nonlinear smoothers
4) Explore optimal transportation for nonlinear smoothers.
5) Combine all knowledge and determine the causality chain of the ocean problem.
For objective 1) we have managed to put causality on a completely new footing by considering all possible contributions/drivers to a given process. This new framework allows for full specification of each driving process, including how processes have combined influence on the target process. Interestingly, any nonlinear multivariate interaction has been ignored up to now in causality estimation, while most physical, biological etc. systems are strongly governed by this category of interactions. A paper highlighting this result will soon be submitted to PNAS. Furthermore, we made good progress on bringing causality in a Bayesian framework, allowing for scientific reasoning. This problem turned out to be harder than expected, or perhaps it is better to say it is richer than expected. We hope to produce a paper on this in this autumn.
For objective 2) we have made a start with this, but real progress is expected with the new hire starting this Autumn.
For objective 3) and 4) huge progress has been made, with several publications and many oral presentations at international conferences and workshops. We have been so successful that we now have 3 working methods to choose from, all with different advantages and disadvantages. I can safely say that we are leading the field for high-dimensional applications.
We have set up the numerical structure and have started initial data-assimilation experiments. It turns out that the runs are more expensive than anticipated (unexpected, we had done several studies and gathered extensive expert opinions on this before the start of the project, but the estimates were off by a large margin). This means we will have to scale down the ambition a bit, but we still will be able to perform an extensive study in the last 2 years.
We have made progress beyond the state of the art in two main areas:
1. building a consistent causality framework that allows for inclusion of nonlinear multivariable interactions, AND building it into the Bayesian framework
2. developing fully nonlinear data-assimilation methods for high-dimensional geophysical applications.
The second has resulted in numerous high-profile publications, while several papers will be submitted soon on the former.
We are very excited about this progress that have the potential to change many science fields.
Diffewrence between assimilation results and observations