Skip to main content

Statistical Learning for Earth Observation Data Analysis.

Periodic Reporting for period 4 - SEDAL (Statistical Learning for Earth Observation Data Analysis.)

Reporting period: 2020-03-01 to 2020-08-31

The Earth is a highly complex and evolving networked system. In the last few hundred years, human activities have precipitated an environmental crisis on Earth. The global climate change induced by human activities makes it necessary to provide quantitative monitoring tools of the Earth system processes, as well as qualitative models able to explain the feedback mechanisms at different temporal and spatial scales. Exploiting satellite data, in situ measurements, and statistical methods provide the most convenient and accurate approach to monitor the Earth in space and time. However, the lack of a unified mathematical framework for prediction, learning, and understanding the complex processes and the relations between climate and biogeophysical variables call for advances in Earth Observation (EO) data processing methods. The SEDAL project is an interdisciplinary project that aims to develop novel statistical learning methods to analyze EO data. In the last decade, machine learning models have helped to monitor land, oceans, and atmosphere through the analysis and estimation of climate and biophysical parameters. Current approaches, however, cannot deal efficiently with the particular characteristics of remote sensing data. This problem is largely increasing: several satellite missions with improved spatial, spectral, and temporal resolutions are being launched or planned for the immediate future. In addition, a plethora of in-situ measurements on land, ocean, and atmosphere are being collected. We are facing the urgent need to process and understand huge amounts of complex, heterogeneous, multisensor, multitemporal, and unstructured data. How to make sense of all these data streams with machine learning methods is our challenge. SEDAL develops new statistical inference methods adapted to the EO data characteristics. We develop advanced prediction methods to improve efficiency and accuracy while attaining more credible uncertainty estimates, encode physical knowledge about the problem, and derive sensible feature ranking from empirical inference models. Understanding is more difficult than prediction, so we are concerned about computing causal graphs to explain the potentially complex interactions between observed variables. This project thus addresses the fundamental problem of moving from correlation to dependence and then to causation through EO data analysis. The theoretical developments are guided by the challenging problems of estimating biophysical parameters and climate variables and learning causal relations at both local and global scales.
High productive and research activities have been carried out. As a summary of the results obtained, the project led to 232 publications, distributed in the edition of 3 books and book chapters, 132 international conference manuscripts, and 97 journal papers, most of them published at the top venues in the field. The team members gave over 50 invited presentations about the topics of the project at international events, and Prof. Camps-Valls gave more than 15 keynote and invited talks at relevant conferences. A complete list of publications is available at

The WP1 improved prediction models by adaptation to Earth Observation data characteristics. We mainly relied on the frameworks of kernel learning, Bayesian inference, and deep learning to tackle the inverse problem posed in EO data processing. Gaussian Processes (GPs) allowed to include spatial-spectral-temporal relations, signal-to-noise feature relations, and confidence intervals for the predictions. We developed GPs for including physics-inspired priors, noise characteristics, multisource sensor fusion, multi-output regression, emulating physical models, data-model assimilation strategies, and model's efficiency.

The WP2 developments in feature analysis, knowledge extraction, and causality in Earth observation data required improved measures of (conditional) independence, designing experiments in controlled situations, and using high-quality data. We advanced in feature extraction and fusion methods, methods for information coding, automatic feature ranking techniques with GPs, incorporation of active learning and Bayesian optimization strategies, and developed the framework of sensitivity maps for kernel methods, which give information about the most sensitive features and characteristics about their sampling ultimately deployed in model inversion and emulators. Importantly, we developed several causal discovery algorithms; based on Granger causality, unbiased convergent cross-mapping, and additive noise models and kernel deviance measures, and showcased performance in Earth and Climate sciences.

The WP3 showcased relevant applications in geosciences and remote sensing. We extended the application domain from land/vegetation to water (lakes, ocean) and atmosphere (atmospheric temperature/moisture and trace gases profiles) domains, and developed a wide diversity of methods and applications to deal with: 1) Vegetation monitoring at local and regional scales (mainly chlorophyll content, leaf area index, FAPAR, sun-induced fluorescence, water content, as well as new products for vegetation monitoring from passive microwave derived vegetation optical depth (VOD); 2) Carbon, water, energy, heat fluxes monitoring at global scale (e.g. gross primary production, latent heat), plant drivers (e.g. plant traits, spring-onset, maximum light use efficiency, wood density); and soil moisture; 3) water type classification, oceanic chlorophyll, suspending matter, and ocean colour monitoring; and 4) atmospheric profile parameter estimation (e.g. temperature, moisture).
We have made progress and impacted the scientific-technical communities in different ways: scientific (generation and dissemination of knowledge), educational (training of PhD students, undergraduate and master lectures), technological (transfer of technology and methodologies), and societal (application of the developed regression methods to the highly relevant field of EO, with certain application to other fields of Science and Engineering). Three ways serve the scientific content dissemination: (1) publications in reference journals/conferences in machine learning and signal processing (IEEE TNN, JMLR, PLOS One) and application community (NATURE, NATURE COMM, SCIENCE ADVANCES, JGR, RSE, IEEE-TGARS, IEEE-GRSL), and (2) organization of special sessions and satellite workshops within the main conferences of targeted research domains (EGU, IGARSS, ESA, AGU) that brings together researchers in machine learning and also geoscience and remote sensing. The project has contributed towards the (pre-)operational capacities of the EU Copernicus program, by providing new methodologies suitable for reliable retrieval, analysis, and assessment of the land products. Many applications are considering Sentinels data for retrieval or emulation of leaf and canopy RTMs for application to Sentinels 2/3 data. These goals along with an improved understanding of uncertainties and exploitation of large heterogeneous data have been recently identified as the cornerstones for the XXI century climate science. A by-product of SEDAL consists of strengthening the competitiveness of the European fluorescence and carbon cycle research by bringing together different experimental, observational, and statistical inference communities for an integrated assessment and knowledge generation.