Periodic Reporting for period 1 - IntelliAQ (Artificial Intelligence for Air Quality)
Reporting period: 2018-10-01 to 2020-03-31
The IntelliAQ project develops novel approaches for the analysis and synthesis of global air quality data based on deep neural networks. The foundation of this project is one of the world’s largest collection of surface air quality measurements, which was recently assembled at the Jülich Supercomputing Centre (JSC) at Forschungszentrum Jülich and plays a pivotal role in the ongoing first comprehensive Tropospheric Ozone Assessment Report (TOAR).
This database is complemented with data from OpenAQ, the world’s leading effort to collect global air pollutant measurements in near realtime. Through combination of this unprecedented treasure of global air quality data with high-resolution geodata, weather model output, and satellite retrievals of atmospheric composition, huge training data sets for deep learning will be constructed, which provide a globally consistent characterization of individual measurement locations and regional air pollution patterns.
State-of-the-art deep learning methods are applied to this unprecedented dataset in order to
fill observation gaps in space and time,
provide short-term forecasts of air quality, and
assess the quality of air pollutant information from diverse measurements.
The combination of diverse data sources is unique, and the project is the first to apply the full potential of deep neural networks on global air quality data. The achievement of the three IntelliAQ objectives will shift the analysis of global air pollutant observations to a new level and provide a basis for the future development of innovative air quality services with robust scientific underpinning.
1. Definition of a data infrastructure and workflows
see attached figure 1
Specific work accomplished in this area:
* defined workflow and built system to regularly copy current air quality data from OpenAQ
* negotiated contract with OpenAQ to improve OpenAQ metadata model and metadata editing
* reviewed progress by OpenAQ team and their contractor; a unique station id was defined and a metadata editor including a metadata model were defined. The work contract is close to completion.
* defined new enhanced database model for TOAR database to allow for easier access to measurement series and better documentation of data provenance
* developed web services to allow flexible extraction of geospatial properties around measurement station locations. These data are used as categories to classify stations and disaggregate deep learning results. Development of these web services is 75% completed. Geospatial data providers have been contacted to sort out potential data protection issues. The web service concep was presented at the 14th IEEE eScience conference in Amsterdam, Nov 2018.
* begun development of an automated quality control tool for environmental time series. The tool is based on robust statistical methods and coded as open source Python package. It was presented at a workshop of the Helmholtz Digital Earth project and at the EGU conference in Vienna, Austria in April 2019. Development status is 30%.
* training of team members to achieve common level in software development including coding standards, use of software repositories, code documentation, etc.
2. Deep Learning
* two master theses were completed exploring different deep learning concepts: timeseries analysis with convolutional neuiral networks (CNN) (Kleinert, 2018) and movie frame prediction of surface temperature fields (Hussmann, 2019).
* the Kleinert (2018) work was presented at the EGU conference in Vienna, Austria in April 2019.
* explored methods for dealing with imbalanced datasets (Bing et al., 2019, presented at the EGU conference in Vienna, Austria in April 2019)
* training of team members in deep learning (Coursera and other training courses)
* exploration of more sophisticated deep learning techniques, adaptation of these techniques to meteorological and air quality data, and implementation of software on Jülich HPC systems. First studies to test performance of higher-order deep learning networks.
Training a diverse research team to reach state-of-the-art skills in earth system science as well as software development and data management is in combination goibng beyond state-of-the-art education and training. While each element of IntelliAQ work in itself has up to now remained unspectacular, the combination of novel web service concepts with state-of-the-art deep learning methods is beginning to yield novel results, which are cutting edge. This progress requires a solid understanding of meteorology and air quality science as well as deep learning methods and computer code. Therefore, training of team members has been an important element of the first project year.
First experiments with different neural network architectures demonstrate that it is highly non-trivial to achieve fundamental improvements of air quality forecasts with machine learning over classical statistical methods. This is related to fundamental properties of meteorological and air quality data, which differ from classical machine learning data such as images or text/speech. Spatio-temporal context is extremely important and will be taken into account in the next project phase. As of yet, no reliable prediction can be made how successful deep learning methods will be to achieve the project objectives.