Periodic Reporting for period 4 - IntelliAQ (Artificial Intelligence for Air Quality)
Reporting period: 2023-04-01 to 2023-09-30
WP1: Data collection and processing
The database from the Tropospheric Ozone Assessment Report (TOAR) has been extended in several aspects. It has reached the highest levels of FAIRness, it has been certified as Core Trust Seal trustworthy repository, and now contains >10 TBytes of data from ~100,000 time series at >20,000 measurement stations around the world. A novel infrastructure with a modern REST API has been developed and successfully launched (https://toar-data.fz-juelich.de; Schröder et al., 2019, 2020; Mozaffari et al., 2020). The database contains surface measurements of ozone, particulate matter, nitrogen oxides, and other air pollutions at hourly time resolution, and a set of meteorological variables extracted from the ERA5 reanalysis. A homogeneous characterisation of station locations through high-resolution Earth observation datasets and a new set of quality control flags allowing full traceability of data quality control steps were developed. The IntelliAQ data infrastructure has been established as a major information hub on air quality related data, which can be interfaced with machine learning applications and be explored in future projects to improve the understanding of atmospheric composition.
WP2: Interpolation
The focus was on the spatial interpolation of aggregated air quality metrics to produce global high-resolution gridded maps. A novel mapping method based on random forests has been developed and evaluated with several methods from explainable AI and uncertainty quantification (Betancourt et al., 2021a, Stadtler et al., 2022). We explored graph machine learning methods to obtain improved temporal interpolations and fill gaps in the measurement time series (Betancourt et al., 2023).
WP3: Forecasting
After a systematic exploration of different deep learning architectures (Kleinert et al., 2019, 2021a), we developed a new forecasting tool for air quality at station locations (Leufen et al., 2023), which outperforms a state-of-the-art ensemble of regional chemistry transport models. To ensure reproducibility and transparency, we developed a new software environment (MLAir: Leufen et al., 2021). The important topics of extreme value predictions and threshold exceedances was picked up by Gong et al. (2019) and further pursued in a master thesis. Effects from atmospheric transport effects were investigated by Kleinert et al., 2021b. Forecasting of spatiotemporal fields has been developed based on deep learning methods from video prediction (Hußmann, 2019; Gong et al., 2020), including code optimisation for HPC (Kesselheim et al., 2021). A visionary discussion article on “Can deep learning beat numerical weather prediction?” was published in Philosophical Transactions in spring 2021. This article had > 55,000 downloads and > 200 citations in little over two years since it was published.
WP4: Quality assurance
The project made some relevant contributions to air pollution data quality control by developing a novel modular approach and investigating the statistical foundations of quality control (Kaffashzadeh et al., 2019a, 2019b, 2020).
The project work enabled us to participate in the development of AtmoRep (https://arxiv.org/abs/2308.13280) which is the world's first large-scale representation model of atmospheric dynamics. AtmoRep has successfully demonstrated that self-supervised learning on large volumes of data (similar to large language models) also provide substantial benefits to weather forecasting and the analysis of weather patterns.
IntelliAQ was by definition an interdisciplinary project as it combines atmospheric research with machine learning. The air pollution studies performed in IntelliAQ cross-fertilize similar studies on meteorological problems where our research group collaborates with national and European partners (MAELSTROM, KISTE, WestAI, Warmworld). MLAir, our time series forecasting model, has become the tool of choice for the ERC proof-of-concept grant AQPlus4 (starting in November 2023), and the DestinE use case on air quality (DE370c), contracted by ECMWF.
The knowledge gained in IntelliAQ was presented at numerous conferences and in various journal articles. We also organized three major events as part of the grant: An IntelliAQ and TOAR workshop in Cologne (March 2023) and two workshops on "Transformers for environmental sciences" (Magdeburg, September 2022) and "Large-scale deep learning for the Earth system" (Bonn, September, 2023), respectively. The latter workshop attracted 350 scientists, who registered for in-person or online participation. The machine learning methods developed in IntelliAQ have become the content of a University lecture on Machine learning for atmospheric science at the University of Bonn and at five courses in European summer schools. IntelliAQ aimed to set new standards for interactive FAIR data processing. All software code and data developed in IntelliAQ is open source and freely accessible.