Skip to main content

Accurate and Scalable Processing of Big Data in Earth Observation

Periodic Reporting for period 2 - BigEarth (Accurate and Scalable Processing of Big Data in Earth Observation)

Reporting period: 2019-10-01 to 2021-03-31

With the unprecedented advances in the satellite technology, recent years have led to a significant increase in the volume of Earth observation (EO) data archives. As an example, through the Copernicus programme, which is the European flagship satellite initiative in EO, Sentinel satellites acquire a volume of roughly 12TB satellite images per day and the total size of the Copernicus data archives is almost a volume of 20PB. Thus, accurate and scalable systems to discover crucial knowledge from massive EO data archives on the state of our planet Earth have recently emerged. Existing systems allow querying satellite images required for the considered EO applications based on keywords/tags in terms of sensor type, geographical location and data acquisition time of the satellite images stored in the archives. However, in the era of big data, the content of the satellite data is much more relevant than the keywords/tags. In order to keep up with the growing need of automatization, knowledge discovery systems and tools that operate on the content of the satellite images are necessary.
In the ERC BigEarth project, we develop cutting-edge methods for: i) large-scale image representation learning; and ii) large-scale image search and retrieval for an accurate and fast discovery of crucial information for observing Earth from big EO archives. The methods developed in the ERC BigEarth project provide the foundations for knowledge discovery systems that index and query the complex content of large-scale EO data in a scalable and accurate manner. In detail, the BigEarth project consists of five main Aims in total, from which four Aims are associated to the development of novel methodologies and tools on the main challenges of Big EO data and also one Aim is related to the benchmark archive construction to validate the algorithms and the software.

Aim 1: Development of novel methods and tools to characterize and exploit high level semantic and spectral information present in remote sensing (RS) images;
Aim 2: Development of novel feature extraction methods and tools to directly extract features from the compressed RS images;
Aim 3: Development of accurate and scalable RS image indexing and retrieval methods together with associated tools;
Aim 4: Development of methods and tools to integrate feature representations of different RS image sources into a unified form of feature representation;
Aim 5: Construction of a benchmark archive with high number of multi-source RS images.

The methods and algorithms developed in the BigEarth project: 1) address the challenges on knowledge discovery from big data archives for EO, which contributes to the EU’s Artificial Intelligence research and innovation agenda; and 2) ease the information discovery from massive archives based on efficient and effective modelling, indexing and querying the complex content of RS images (which go beyond the simple keywords/tags-based search).
We have developed several methods and algorithms in framework of remote sensing (RS) image classification, search and retrieval for fast and accurate information discovery from massive data archives.
Deep learning (DL) based methods have been found popular in the framework of RS image retrieval, indexing and scene classification. Most of the existing DL based methods assume that training images are annotated by single-labels, however RS images typically contain multiple classes and thus can simultaneously be associated with multi-labels. Despite the success of existing methods in describing the information content of very high resolution aerial images with RGB bands, any direct adaptation for high-dimensional high-spatial resolution RS images falls short of accurate modeling the spectral and spatial information content. To address this problem, we developed several methods in the framework of the multi-label classification, indexing and retrieval of high dimensional RS images. As an example, one of our recent methods describes the complex spatial and spectral content of image local areas by a novel K-Branch Convolutional Neural Networks (CNN) that includes spatial resolution specific CNN branches. Then, it characterizes the importance scores of different local areas of each image and then defines a global descriptor for each image based on these scores. This is achieved by a novel multi-attention strategy that utilizes the bidirectional long short-term memory networks. Moreover, to accurately describe the relationships between the objects and their attributes present in RS images, we researched on RS image captioning and developed a novel retrieval system that exploits image captions generated by a novel deep captioning algorithm.
Most DL models require huge amounts of annotated images during training to optimize all parameters and reach a high performance during evaluation. The availability and quality of such data determine the feasibility of many DL models. To address this issue, we have recently introduced BigEarthNet that is a large-scale benchmark archive for RS image understanding (it is available at http://bigearth.net).The BigEarthNet benchmark archive (which is made up of 590,326 Sentinel-2 image patches) enables data-hungry DL algorithms in the context of multi-label RS image retrieval and classification tasks. Thus, BigEarthNet makes a significant advancement for the use of DL in RS, opening up promising directions to advance DL-based research in the framework of RS image scene classification and retrieval. All the data and the DL models are made publicly available, offering an important resource to guide future progress on image scene classification and retrieval problems in RS.
Due to the dramatically increased volume of RS image archives, images are usually stored in compressed format to reduce the storage size. Existing content based RS image retrieval and classification systems require as input fully decoded images, thus resulting in a computationally demanding task in the case of large-scale image retrieval problems. To overcome this limitation in retrieval problems, we developed novel systems, such as: 1) a system that achieves a coarse to fine progressive RS image description and retrieval in the partially decoded JPEG 2000 compressed domain; and 2) a system that applies scene classification with deep neural networks in JPEG 2000 compressed domain. The developed systems significantly reduce the computational time with similar retrieval and classification accuracies when compared to traditional approaches. To achieve high time-efficient search capability within huge data archives, we also researched on deep hashing methods that encode high-dimensional image descriptors into a low-dimensional Hamming space where the image descriptors are represented by binary hash codes. By this way, the (approximate) nearest neighbors among the images can be efficiently identified based on the the Hamming distance with simple bit-wise operations. One of the recent method that we developed is the metric-learning based hashing network, which learns: 1) a semantic-based metric space for effective feature representation; and 2) compact binary hash codes for fast archive search. Our network considers an interplay of multiple loss functions that allows to jointly learn a metric based semantic space facilitating similar images to be clustered together in that target space and at the same time producing compact final activations that lose negligible information when binarized.
The BigEarth project addresses the emerging methods and tools that allow an efficient information discovery from the massive Eart Observation (EO) data archives. As an example, the BigEarth team has recently developed the BigEarthNet benchmark archive to drive research and innovation on machine learning, in particular deep learning studies, for EO. BigEarthNet opens up promising directions to advance studies for the analysis of large-scale EO data archives. The dataset is accessible through its own webpage and in Google Earth Engine and in TensorFlow. We are currently working on enriching the BigEarthNet archive by: i) extending it to whole Europe; ii) including Sentinel-1 SAR patches; iii) including different types of auxiliary data (e.g. digital elevation models); iv) including the class-wise appearance percentages in each image. These enrichments will contribute to our ongoing research and development on multi-modal/cross-modal content based RS image search and retrieval. We currently research also on developing 3D deep compression methods for remote sensing images with high spectral and spatial resolution. Then, the development of multi-modal RS image retrieval system that operates on 3D compressed domain is planned. The developed system will be evaluated on the multi-modal BigEarthNet benchmark archive that we are currently constructing.
BigEarthNet Users