- Regarding the team
The project started with a delay of 9-12 months, after agreement with the project managers due to COVID hiring problems. The research team comprises Haochen Wang, Ilze Amanda Auzina, Leonard Bereska and a postdoctoral researcher will be hired now (vacancy closes in September 2023).
- What is the problem/issue being addressed?
Most Machine Learning applications have focused on static problems, e.g. classification or regression. However, the majority of real-world problems are dynamic, such as videos recording everyday activities and events, or videos of scientific recordings. Hence, the project's overall objective is to learn true dynamics from observation data.
- Why is it important for society?
Learning from data is becoming increasingly relevant with AI being used in climate, molecular dynamics, or robotic models. What is more, static data and models trained on them are generally easier to control. The reason is that they correspond to much smaller amounts of data and it is easier to do a near exhaustive examination of the model behavior. However, as we move to dynamic data, the same models are not a good fit because the assumptions made for static data (limited number of appearance variation, limited number of correlation patterns, near-stationarity) is not guaranteed. This means that not only existing models in the literature will not work well but also that it would be harder to ensure their safety and reliability. As in the recent years, both at the scientific and societal (EU, World) level there is an increasing care for AI safety, it is important to have reliable models for dynamic data, which comprise the vast majority in real life.
- What are the overall objectives?
The ob in this project can be divided into temporal machine learning objectives, temporal computer vision objectives, and temporal AI safety objectives.
Regarding temporal machine learning objectives, we will explore: learning dynamics (i) from a Bayesian perspective with informative priors; (ii) with continuous latent models that account for time invariant information; or learning a mixture of non-linear dynamics (iii) with transformer inspired dynamic’s slots. Regarding temporal computer vision objectives, with consumer videos recording events and activities, understanding all that is happening in the video is critical, from recognizing objects and their instances, to actions to events (complex actions) to interactions (actions between objects). Crux to that is Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. In terms of temporal AI safety objectives, to ensure control, the overall objectives include mechanistic interpretability and goal formation emergence.