Skip to main content

EXPLAINABLE AI PIPELINES FOR BIG COPERNICUS DATA

Periodic Reporting for period 1 - DeepCube (EXPLAINABLE AI PIPELINES FOR BIG COPERNICUS DATA)

Reporting period: 2021-01-01 to 2021-12-31

DeepCube leverages on advancements in the fields of AI and semantic web to unlock the potential of big Copernicus data. DeepCube addresses new and ambitious problems that imply high environmental and societal impact, enhance our understanding of Earth’s processes, correlated with the current and future Climate emergency, and feasibly generate high business value by enlarging the EO-centered ecosystem of companies and SMEs. To achieve this, we bring extend mature ICT technologies, such as the Earth System Data Cube, the Semantic Cube, and the Hopsworks platform for distributed Deep Learning (DL), and integrate them to deliver an interoperable platform that can be deployed in several cloud infrastructures. We then use these tools to develop novel DL pipelines to extract value from big Copernicus data. DeepCube develops DL architectures that extend to non-conventional data and problems, introduces a novel hybrid modeling paradigm for data-driven AI models that respect physical laws, and opens-up the DL black box through Explainable AI and Causality. We showcase these architectures in five Use Cases (UCs), two on business, two on earth system sciences, and one on migration (see Figure). The use cases overall objectives are:
-To predict localized impacts of meteorological drought and heat waves in sub-Saharan Africa,
-To go beyond mere associations and infer cause-effect relations between climate and migration/displacement across the sub-Saharan Africa by mining climate anomalies and migration flows, characterizing also the main socio-economic and environmental indicators of migration patterns,
-To better predict the outbreak and extent of wildfires in Mediterranean ecosystems in view of climate change,
-To exploit the abundance of SAR data, primarily Sentinel-1, to model ground deformation patterns associated with volcanic activity,
-To create a commercial service, based on SAR time series products combined with industrial geodetic measurements, to monitor national critical infrastructure works in Italy,
-To incorporate the environmental dimension towards sustainable tourism.
The work performed and main results achieved so far in the context of the DeepCube UCs are summarized as follows:
-In UC1 we created an open-access dataset for vegetation drought forecasting in Africa, consisting of 4 terabytes of Sentinel-2 based minicubes and incorporating more than 10 spatio-temporal different data sources (meteorological, satellite and geospatial). Experiments show that the high-resolution spatial context does help to improve vegetation state forecasting.
-In UC2 we harmonized a comprehensive up-to-date dataset for Somalia from various migration dimensions that include socioeconomic, conflict and, climate related, with both EO and non-EO data. Our causal analysis reveals that yearly time-frames have a critical impact on drought displacement. We created a causal graph in Baidoa, which returns that decreases in vegetation and in cattle prices cause drought displacement.
-In UC3 we created an open-access data cube for the wildfire research community. This is an 800 gigabyte datacube, covering Greece for 2009-2021 and comprises 15 different data sources (meteorological, satellite and geospatial). We developed models for next-day fire hazard forecasting in Greece. Our experimental results show that our models capture spatial, temporal and spatio-temporal context with better accuracy than existing methods. We integrated explainability algorithms in the deep learning models and we set up a semantic data cube in order to move from hazard prediction to risk assessment.
-In UC4a we created an annotated InSAR dataset and developed a self-supervised learning pipeline for volcanic activity detection, achieving higher accuracy than state-of-the-art methods. We created a second pipeline for synthetic to real InSAR data adaptation via prototype learning based vision transformers. The model is robust in both the synthetic and the real domain.
-In UC4b we created a training dataset by manual labelling of Persistent Scattering Interferometry SAR (PSInSAR) point clouds. We deployed a Graph Neural Network model trained to identify unreliable points contained in a PSInSAR point cloud, fusing multiple informative layers (e.g. DEM, Land cover/user). We developed a simulator for the generation of PInSAR trend variation time-series.
-In UC5 we extracted and harmonised data in analysis ready format and created a datacube with EO and non-EO data, including Terra Nordeste tourist agency data for touristic activity, Copernicus Atmospheric Monitoring Service data for air quality data, and NOAA data for land topography. We implemented an LSTM predictive model for NO2 pollutants and a SHAP explainability algorithm.

The work performed towards the achievement of our technological objectives is summarized as follows:
-We designed and implemented the first version of the DeepCube platform, in which the various DeepCube technologies are integrated. The platform is delivered as Infrastructure as Code, it is container ready, GPU ready and can utilize scalable clusters.
-We extended the Earth System Data Cube capabilities for on-the-fly regridding and we implemented new and efficient n-dimensional moving window operations for advanced cube analytics.
-We developed the 1st version of the semantic data cube module and applied the semantic data cube technology in the Use Cases.
-We implemented the 1st iteration for the Visual Query Builder.
-We collected and analysed social media data, extracting location information and visual concepts and developed a standalone social media API to retrieve collected and analysed posts within DeepCube. We designed and developed a Social Media Web App to display data served by the API.
-We configured and deployed the Hopsworks platform and performed a series of technical workshops to train the DeepCube platform on its use.
DeepCube in the 1st reporting period has published five (5) research articles in arXiv, made available five (5) new distinct annotated datasets for download and re-use, appropriate for AI4EO applications, provided open access to three (3) new ontologies. It also maintains two (2) code repositories with public projects. Finally, it has created several openly accessible demos, including eight (8) use case Jupyter notebooks, three (3) visualization tools, and dedicated demos for the DeepCube platform and the Hopsworks deep learning platform. Links to these resources can be found the DeepCube website.

In addition to what was foreseen in the GA, DeepCube has had new impacts. Firstly, DeepCube has initiated collaboration with TECNE, a large Engineering company in charge of the Autostrade per l'Italia Group, which will exploit AI on InSAR time-series for monitoring geo-technical works. DeepCube has created strong links with iDMC (Internal Displacement Monitoring Center), facilitating data and products exchange, contributing to recommendations made on policy and operational decisions that can reduce the risk of future displacement and improve the lives of internally displaced people (IDP) worldwide. We have also signed an agreement with Facebook (Meta) data for good initiative, which allowed Facebook to provide mobility data to be used in the context of the DeepCube use cases. Finally, DeepCube use cases collaborated and found synergies with JRC’s Knowledge Centre on Migration and Demography and the European Forest Fire Information System (EFFIS).
DeepCube concept: technological developments, data collection, data analysis and demonstrations