Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Non-Markovian Memory-Based Modelling of Near- and Far-From-Equilibrium Dynamical Systems

Periodic Reporting for period 3 - NoMaMemo (Non-Markovian Memory-Based Modelling of Near- and Far-From-Equilibrium Dynamical Systems)

Reporting period: 2022-12-01 to 2024-05-31

Our modern world is awash with data. Of the many varieties of data available, time series data is one of the most pervasive. Experiments, computer simulations, meteorological, and economical observations are all examples of contexts where vast amounts of time series data are generated and analysed. However, extracting useful information from this data is a daunting task. In order to understand such data, one might aim to discern the form of any underlying equation of motion, or perhaps some other mathematical system, that might give rise to data with similar characteristics. Or perhaps one might like to make classifications at a system level by comparing many data sets. Concrete examples of where these questions arise in chemistry and physics include the dynamics of proton transfer in water, chemical reactions along a certain reaction coordinate, the folding and unfolding of proteins, or the motion of particles in a liquid. These systems are described by the classical or quantum-mechanical equations of motion, i.e. the Hamilton or Schrödinger equations. In biology, one is often concerned with non-equilibrium systems, for example with living organisms. Typical time series can be defined from the motility pattern of cells or multicellular organisms, a typical task where such time series data are potentially helpful is the classification of cancer cells into benign and malign. But also outside the natural sciences, time series data play an important role, for example, the price of stocks or currency exchange rates or meteorological data such as the daily temperature in Berlin. The time series data representing all of these systems, whether extracted from some experimental apparatus, generated by simulation or recorded manually, exhibit shared fundamental traits. Historically, different fields have developed their own tools to analyse their respective data sets. Physicists and chemists frame their systems in terms of well-known classes of differential equations with noise. Biologists often call on random walk models to describe cell motions. Computational economists employ a range of different algorithmic models. Central to the present ERC project, NoMaMemo, is the proposal that all of these systems can be described within a unified framework based on stochastic coupled non-linear integro-differential equations.
To achieve these goals, we have developed a suite of advanced numerical techniques, designed to extract key information from a broad range of data sets, which is indiscriminate to the source of the data. That is to say, the analysis methods are completely general. To accomplish these tasks, we have built on previous work from our group in developing what is known as “memory extraction” techniques. The unified framework that is shared by so many examples is based on a set of differential equations that contain an environmental noise contribution, an energy landscape due to force interactions, and a memory-dependent friction term that couples to the entire history of the system. These systems are referred to as generalised Langevin equations (GLE). Memory extraction, then, means to disentangle the functional form of the memory-dependent friction term directly from the time series data. Having accomplished this, we have a specific characterization for a given system in terms of the completely general NoMaMemo framework.

We have also clarified how to derive the GLE from the underlying equations of motion, which is achieved by so-called projection techniques. Whereas the classical projection methods by Mori and Zwanzig led to rather unhandy expressions, we have by a novel hybrid projection technique derived an exact GLE whose parameters can all be extracted from time-series data in a unique manner. Our hybrid GLE contains a non-linear friction term, which has been neglected in previous works. With our exact formulation of the GLE, we can quantify the importance of the non-linear term and can therefore quantitatively assess whether its neglect is justified.

We have so far applied our GLE techniques to a variety of different systems: For the ultrafast vibrations of molecules, we have extracted the time-dependent friction acting on vibrating chemical bonds and can therefore predict infrared spectra in very good agreement with experiments. For the folding of proteins, we could show that non-Markovian friction effects not only determine the diffusional kinetics along the protein folding reaction coordinate but also show that friction memory modifies the folding times. Finally, for a stochastic searcher that tries to find randomly distributed targets, we show that memory of the search random walk improves the search efficiency in case the targets are distributed in a correlated fashion.
The importance of our results cannot be understated. What we will be providing is a numerical-analytical toolbox that can in principle be employed by physicists, chemists, biologists, computer scientists, economists, and more, to generate well-defined models to describe their data. This reach can be attested by the diverse work accomplished by our group so far. Furthermore, since our protocol is founded on a GLE framework, there is a pre-established history in theoretical physics underlying the methods. What sets NoMaMemo apart is the development of advanced memory extraction techniques and our successful testing across many fields. At the level of society, our tools are readily available for the application of understanding time series data.

The expected results of the project are as follows. We plan to develop general techniques for extracting memory functions from time series data, including non-linear friction effects that could not be treated with previous methods. We will classify models as equilibrium or non-equilibrium. We will develop clustering techniques that will allow us to analyse single cells and organisms based on their movement patterns. Based on this we will be able to cluster and sort single cells and also make simple models for their internal network interactions and driving forces, based on the time dependence of the extracted memory function. We will develop techniques for memory-based time series prediction, we are currently testing these techniques on meteorological data and have achieved accuracies in predicting the weather that can compete with professional weather prediction tools. We will develop simulation methods to generate time series data based on memory models. These methods are essential to test our memory extraction tools. We will further develop our memory-based optimisation techniques for random walks. We will advance the memory-based modelling of reaction kinetics and protein folding dynamics. And finally, we will further develop memory-based techniques to predict infrared absorption spectra of complex solutions and molecules. We have already made significant advancements in all of these objectives.
Non-Markovian memory-dependent protein folding of the alpha3D protein