CORDIS - EU research results
CORDIS

Artificial Intelligence for Air Quality

Periodic Reporting for period 4 - IntelliAQ (Artificial Intelligence for Air Quality)

Reporting period: 2023-04-01 to 2023-09-30

Artificial Intelligence (AI) is transforming science and society since natural language models have been released and ground-breaking results were published on cognitive problems such as image and speech recognition, automated language translation, robotics, and strategic games. This has become possible because of recent advances in massive data processing capabilities (“Big Data”) and the development of deep learning (DL) machine learning architectures, which “learn” billions of parameters. The uptake of DL in environmental science has been slow initially and IntelliAQ was one of the first projects in the atmospheric sciences to fully embrace the potential of modern big data processing and DL. The project has thus been on the forefront of developing AI applications for air quality, weather, and climate. The IntelliAQ team witnessed and contributed to major breakthroughs, such as the first large-scale atmospheric foundation model AtmoRep. Just in the past 18 months, several studies, primarly from large tech companies in the US and China demonstrated the superiority of AI models in weather forecasting, and this is now disrupting the landscape of operational weather forecasting. With IntelliAQ, we have been able to closely follow these developments and spearhead similar breakthrough developments from academic institutions in Europe. IntelliAQ has shifted the analysis of global air pollutant observations to a new level and provides a basis for the future development of innovative air quality services with robust scientific underpinning.
The IntelliAQ project was structured into 4 work packages. The following sections briefly summarize the work performed in the individual work packages.

WP1: Data collection and processing
The database from the Tropospheric Ozone Assessment Report (TOAR) has been extended in several aspects. It has reached the highest levels of FAIRness, it has been certified as Core Trust Seal trustworthy repository, and now contains >10 TBytes of data from ~100,000 time series at >20,000 measurement stations around the world. A novel infrastructure with a modern REST API has been developed and successfully launched (https://toar-data.fz-juelich.de; Schröder et al., 2019, 2020; Mozaffari et al., 2020). The database contains surface measurements of ozone, particulate matter, nitrogen oxides, and other air pollutions at hourly time resolution, and a set of meteorological variables extracted from the ERA5 reanalysis. A homogeneous characterisation of station locations through high-resolution Earth observation datasets and a new set of quality control flags allowing full traceability of data quality control steps were developed. The IntelliAQ data infrastructure has been established as a major information hub on air quality related data, which can be interfaced with machine learning applications and be explored in future projects to improve the understanding of atmospheric composition.

WP2: Interpolation
The focus was on the spatial interpolation of aggregated air quality metrics to produce global high-resolution gridded maps. A novel mapping method based on random forests has been developed and evaluated with several methods from explainable AI and uncertainty quantification (Betancourt et al., 2021a, Stadtler et al., 2022). We explored graph machine learning methods to obtain improved temporal interpolations and fill gaps in the measurement time series (Betancourt et al., 2023).

WP3: Forecasting
After a systematic exploration of different deep learning architectures (Kleinert et al., 2019, 2021a), we developed a new forecasting tool for air quality at station locations (Leufen et al., 2023), which outperforms a state-of-the-art ensemble of regional chemistry transport models. To ensure reproducibility and transparency, we developed a new software environment (MLAir: Leufen et al., 2021). The important topics of extreme value predictions and threshold exceedances was picked up by Gong et al. (2019) and further pursued in a master thesis. Effects from atmospheric transport effects were investigated by Kleinert et al., 2021b. Forecasting of spatiotemporal fields has been developed based on deep learning methods from video prediction (Hußmann, 2019; Gong et al., 2020), including code optimisation for HPC (Kesselheim et al., 2021). A visionary discussion article on “Can deep learning beat numerical weather prediction?” was published in Philosophical Transactions in spring 2021. This article had > 55,000 downloads and > 200 citations in little over two years since it was published.

WP4: Quality assurance
The project made some relevant contributions to air pollution data quality control by developing a novel modular approach and investigating the statistical foundations of quality control (Kaffashzadeh et al., 2019a, 2019b, 2020).
IntelliAQ explored and developed several novel methodologies. When IntelliAQ started, no other work on machine learning for air quality analysis or forecasting employed modern deep learning methods. Besides the adoption of new deep learning models and the associated software stack, we made important contributions to the field through the development of statistically sound data preparation and model evaluation methods.
The project work enabled us to participate in the development of AtmoRep (https://arxiv.org/abs/2308.13280) which is the world's first large-scale representation model of atmospheric dynamics. AtmoRep has successfully demonstrated that self-supervised learning on large volumes of data (similar to large language models) also provide substantial benefits to weather forecasting and the analysis of weather patterns.
IntelliAQ was by definition an interdisciplinary project as it combines atmospheric research with machine learning. The air pollution studies performed in IntelliAQ cross-fertilize similar studies on meteorological problems where our research group collaborates with national and European partners (MAELSTROM, KISTE, WestAI, Warmworld). MLAir, our time series forecasting model, has become the tool of choice for the ERC proof-of-concept grant AQPlus4 (starting in November 2023), and the DestinE use case on air quality (DE370c), contracted by ECMWF.
The knowledge gained in IntelliAQ was presented at numerous conferences and in various journal articles. We also organized three major events as part of the grant: An IntelliAQ and TOAR workshop in Cologne (March 2023) and two workshops on "Transformers for environmental sciences" (Magdeburg, September 2022) and "Large-scale deep learning for the Earth system" (Bonn, September, 2023), respectively. The latter workshop attracted 350 scientists, who registered for in-person or online participation. The machine learning methods developed in IntelliAQ have become the content of a University lecture on Machine learning for atmospheric science at the University of Bonn and at five courses in European summer schools. IntelliAQ aimed to set new standards for interactive FAIR data processing. All software code and data developed in IntelliAQ is open source and freely accessible.
Figure 3: Examples of deep learrning.....
Figure 1: The data fusion and machine learning concept of the IntelliAQ project(from Schultz,,2020)
e3sde-system-architecture_en.png