Skip to main content

Statistical Learning for Earth Observation Data Analysis.

Periodic Reporting for period 3 - SEDAL (Statistical Learning for Earth Observation Data Analysis.)

Reporting period: 2018-09-01 to 2020-02-29

"The Earth is a highly complex and evolving networked system. In the last few hundred years, human activities have precipitated an environmental crisis on Earth, commonly known as ""global climate change"". Since the discovery of fossil carbon as a convenient form of energy, the residues of past photosynthetic carbon assimilation have been combusted to carbon dioxide (CO2) and returned to the Earth's atmosphere. Undoubtedly, the global climate change induced by the human activities makes necessary providing quantitative monitoring tools of the Earth system processes, as well as qualitative models able to explain the feedback mechanisms at different temporal and spatial scales. In this context, exploiting satellite data, in situ measurements, and statistical methods provides the most convenient and accurate approach to monitor the Earth in space and time. However, the lack of a unified mathematical framework for prediction, learning and understanding the complex processes and the relations between climate and biogeophysical variables call for advances in Earth Observation (EO) data processing methods.

The SEDAL project is an interdisciplinary project that aims to develop novel statistical learning methods to analyze EO data. In the last decade, machine learning models have helped to monitor land, oceans, and atmosphere through the analysis and estimation of climate and biophysical parameters. Current approaches, however, cannot deal efficiently with the particular characteristics of remote sensing data. This problem is largely increasing: several satellite missions with improved spatial, spectral and temporal resolutions are being launched or planned for the immediate future (as for the EU Copernicus programme). In addition, a plethora of in-situ measurements on land, ocean and atmosphere are being collected. We are facing the urgent need to process and understand huge amounts of complex, heterogeneous, multisensor, multitemporal, and unstructured data. How to make sense of all these data streams with machine learning methods is our challenge.

SEDAL is developing new statistical inference methods adapted to the EO data characteristics. We develop advanced prediction methods to improve efficiency and accuracy while attaining more credible uncertainty estimates, encode physical knowledge about the problem, and derive sensible feature ranking from empirical inference models. Understanding is more difficult than prediction, so we are concerned about computing causal graphs to explain the potentially complex interactions between observed variables. This project thus addresses the fundamental problem of moving from correlation to dependence and then to causation through EO data analysis. The theoretical developments are guided by the challenging problems of estimating biophysical parameters and climate variables, and learning causal relations at both local and global scales.
"
* Description of the action and overview of the main scientific results.

The activities in the SEDAL project are organized in three major workpackages. Task 1 deals with improving retrieval (model inversion) algorithms in different aspects; while Task 2 tackles the problem of advancing in feature extraction and selection, dependence estimation and on identifying causal relationships in EO data. Empirical evidence of the proposed methods is treated in Task 3, mainly dealing with land/vegetation monitoring problems. In the first half of the project we put more efforts into Task 1 and started working on Task 2. The developed methods have been applied in real scenarios of Earth Observation (EO, Task 3), which has been extended from the initial focus on land applications, to treat atmosphere and ocean parameter retrieval as well.

As a summary of the results obtained so far, SEDAL project so far led to 108 publications, distributed in: the edition of 2 books (Wiley & sons, 2017) and 5 book chapters (Elsevier, Springer-Verlag), 62 international conferences manuscripts, and 39 journal papers, most of them published at the top venues in the field. The team members gave over 50 invited presentations about the topics of the project at international events, and Prof. Camps-Valls gave more than 10 keynote and invited talks at relevant conferences and symposia.

* Workpackage 1: Improve prediction models by adaptation to Earth Observation data characteristics.

We mainly relied on the frameworks of kernel learning, Bayesian inference and sampling, and deep learning to tackle the inverse problem posed in EO data processing. In particular, Gaussian Processes (GPs) turned to be a method of choice, and revealed very useful to cope with many problems encountered here. A review paper of GPs for EO data processing was published [J17], along with more detailed book chapters [B1, BC1-BC5]: the framework tackles regression accurately and efficiently, allows to include spatial-spectral-temporal relations, signal-to-noise feature relations and, interestingly, it yields confidence intervals for the predictions. We developed the framework in several ways, and summarized it in a review paper about physics-aware machine learning [C27,J39]. The developed methodologies consider including priors on spatial-spectral-temporal relationships, characteristics of the noise, fusion of multisource sensory data, imposing prediction consistency via multi-output regression schemes, emulating physical models, data-model assimilation strategies and model's efficiency via large-scale learning:

1) We included spatial feature relations in the retrieval models in different ways: (1) designing appropriate composite kernels [B4,C22,J6,J30], (2) indirectly via spatio-spectral image compression [J23], and (3) through deep convolutional neural networks, both in supervised and unsupervised settings [C55,J15, J19]; as well as (4) deep Gaussian processes structures [C46].

2) We developed kernel-based signal-to-noise ratio algorithms for regression, classification, dimensionality reduction and causal inference [B3,B4]. The framework is shown to generalize standard methods in the literature for the case of signal-to-noise dependencies, which resulted in accurate predictions, dependence estimates, and sharper confidence intervals.

3) We exploited multiple sensors through composite kernels [J25] and convolutional neural networks [J19] In particular, we fused synthetic aperture radar (SAR) and optical imagery for rice monitoring and productivity estimation through time; and provided a pure statistical justification for the combination of hyperspectral and LiDAR sensory data, which received the best paper award in the Data Fusion competition (IEEE IGARSS 2016).

4) Time is an important dimension to encode in the models. We did advances in designing proper kernels for regression and land cover classification [J7,J12,J16,J27], but also designed kernel manifold alignment techniques th
"We have made progress and impacted the scientific-technical communities in different ways: scientific (generation and dissemination of knowledge), educational (training of PhD student), technological (transfer of technology and methodologies), and societal (application of the developed regression methods to the highly relevant field of EO, with certain application to other fields of Science and Engineering). Three ways serve the scientific content dissemination: (1) publications in reference journals/conferences in machine learning and signal processing (IEEE TNN, JASA, PLOS One) and application community (NATURE, JGR, RSE, IEEE-TGARS, IEEE-GRSL), and (2) organization of special sessions and satellite workshops within the main conferences of targeted research domains (EGU, IGARSS) that brings together researchers in machine learning and also geoscience and remote sensing.

The project is contributing towards the (pre-)operational capacities in the climate change context of the EU Copernicus programme, by providing new methodologies suitable for reliable retrieval, analysis, and assessment of the land products. Many applications are considering Sentinels data for retrieval or emulation of leaf and canopy RTMs for application to Sentinels 2/3 data. These objectives along with improved understanding of uncertainties and exploitation of large heterogeneous data have been recently identified as the cornerstones for the XXI century climate science. A by-product of SEDAL consists of strengthening the competitiveness of the European fluorescence and carbon cycle research by bringing together different experimental, observational and statistical inference communities for an integrated assessment and knowledge generation. Several approaches for retrieval and analysis have been carried out.

The research objectives of SEDAL are also in line with the key headlines of the Call ""Earth Observation 2014 of the HORIZON 2020 Work Programme."" In particular, the scope of its EO-1-2014 workpackage entitled ""New ideas for Earth-relevant space applications"" explicitly declares ""interest in using Earth-relevant space-based data, new and automatic forecasting models at regional or wider geographical extents."" The algorithms and knowledge produced in SEDAL is perfectly aligned in that item and some applications have direct impact on the EU Copernicus Sentinel missions as well as the upcoming ESA Earth Explorer 8 FLEX.

While still premature and uncertain, it does not escape our notice that the new machine learning algorithms developed in SEDAL could find application and be transferred into other domains of Science (not necessarily tied to remote sensing and geosciences), such as econometrics, bioengineering, computational neuroscience, psychology or medicine. After all, encoding and extracting knowledge and identifying causal relations from sensory data is a common problem in all disciplines of Science and Engineering. The current ERC project is widening new frontiers in Earth observation.

Finally, we are currently exploring other venues beyond the SEDAL aims, such as the analysis of the Earth system under sustainability boundaries and planet boundaries concepts, or the exploitation of machine learning to analyze multidimensional feature relations in the dynamic Earth through the CAB-LAB project hypercube, http://earthsystemdatacube.net. Considering the Earth as a point in a geometric dynamic space and learning trajectories and permitted states from data with machine learning could allow us considering the anthroposphere (socio-economical variables), not just the biosphere-atmosphere variables. All these activities are aimed to consolidate the group and fix a big network of collaborators around and beyond the SEDAL ideas."