Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Big Data for 4D Global Urban Mapping – 10^16 Bytes from Social Media to EO Satellites

Periodic Reporting for period 4 - So2Sat (Big Data for 4D Global Urban Mapping – 10^16 Bytes from Social Media to EO Satellites)

Reporting period: 2021-11-01 to 2023-01-31

By 2050, around three quarters of the world’s population will live in cities. Despite of increasing efforts, global urban mapping still drags behind the geometric, thematic and temporal resolutions of geo-information needed to address these challenges. Nowadays diverse sets of incomplete data exist. For example, Earth observation (EO) satellites reliably provide geodetically accurate large scale geo-information of the cities on a routine basis from space. But the data availability is limited by resolutions and acquisition geometries of the sensors. Complementary, massive amounts of imagery, text messages and GIS data from open sources and social media provide a temporally quasi-seamless, spatially multi-perspective information basis, but with unknown and diverse qualities. So2Sat aims at a joint exploitation of big data from social media and satellite observations for global urban mapping, and aims at breakthroughs in 3D/4D urban modelling, infrastructure occupancy classification, and very high resolution population density mapping on a global scale for revolutionizing urban geographic research. In detail, the following methodological and application objectives will be addressed: improving urban-related information retrieval from EO satellite data, mining urban imagery and text messages from social media data, fusion of heterogeneous data sources, big data processing, as well as pilot application research regarding informal settlements classification and global population density estimation. The outcome of So2Sat will be the first and unique global and consistent spatial data set on urban morphology (3D/4D) of settlements, and a multidisciplinary application derivative assessing population density.
The achievements in each methodological objective (MO) and application objective (AO) of the project are explained as follows.

MO1 Improving information retrieval: There are three sub-objectives in MO1: 1) robust Earth observation image denoising, 2) solve ill-conditioned and underdetermined problems, and 3) image fusion. Regarding the first one, we have developed the so-called nonlocal-means filter to significantly improve the SNR of TanDEM-X SAR images. This smart filtering algorithm was integrated into our radar tomography algorithm that is used for global 3D building reconstruction (Shi et al., 2020a; Zhu et al., 2018). For the second sub-objective, we have achieved by developing a compressive sensing-based radar tomographic algorithm (Shi et al., 2020a) that is tailored to solve ill-conditioned and heavily underdetermined problems. For the third point, we developed multiple algorithms beyond the original objective of fusing images with different resolutions. Description can be found below in MO3.

MO2 Mining social media data: The three sub-objectives in MO2 are 1) find feature representation of social media images, 2) find efficient methods to update 3D building models, and 3) mine information from text messages and metadata. As a common prerequisite for the objectives, we developed stable processors to crawl Flickr images and tweets from the internet. Currently, we have collected more than 25 million social media images and 1.5 billion tweets. Despite this large quantity of social media images, they cover a broad variety of motifs, but only a small fraction of these images contain clear and useful information for individual buildings. This renders the first objective finding a unique representation of individual buildings in the social media images less relevant and the second objective not achievable, as there are not sufficient amounts of useful images for individual buildings. Hence, we shifted our research objective to developing novel algorithms to classifying building functionality (Hoffmann et al., 2019). For the third point, we use Twitter data and implemented natural language processing methods including self-trained multilingual Twitter word embeddings, multilingual large language models and deep learning architectures for classification (Häberle et al., 2022, 2019).

MO3 Optimal information fusion of heterogeneous data: in this MO, we tackle two types of data fusion challenges. One is the fusion of different types of EO images, and the other is the fusion of EO and social media data. Regarding the former one, we have specifically focused on the fusion of synthetic aperture radar data and multi-spectral imagery provided by the Sentinel-1 and Sentinel-2 missions. We developed a decision level fusion algorithm of SAR and optical images for better prediction of the so-called local climate zones classification (Zhu et al., 2022). For the latter, we developed a fusion algorithm of street view and aerial view images for building type classification and compared different types of fusion algorithms (Hoffmann et al., 2019). Last but not least, we also developed a fusion algorithm for social media text and images for building type classification, and results show that linguistic features can indeed improve the classification (Häberle et al., 2022).

MO4 Big data processing: In this MO, we have developed a processing workflow control framework for our data processing on the supercomputer SuperMUC-NG of LRZ, as well as machine learning model training and inference pipeline for global local climate zones classification. This framework is the basis for our global 3D reconstruction and classification processing.

AO1 Classification of global informal settlements: Towards the application objective of providing a detailed mapping of informal settlements, we have successfully carried out the first studies about deep learning techniques dedicated to this purpose (Stark et al., 2019; Wurm et al., 2019). We further developed a transfer learned fully convolutional Xception network (XFCN), trained on a conceptually consistently generated large sample of globally distributed slums. Transfer learning was implemented that helped to improve segmentation results when learning on a variety of slum morphologies, with high F1 scores of up to 89% (Stark et al., 2020). Furthermore, we used the classification results on the urban built environment in So2Sat to relate the physical appearance of the built urban landscape with multiple, intertwined socio-economic or planning processes (Stark et al., 2020).

AO2 Estimation of global population density: currently we have curated a large earth observation dataset called So2Sat POP for population estimation using machine learning (Doda et al., 2022). A follow-on work on population estimation using the model trained on the So2Sat POP dataset has been finished and submitted to a journal. Since the doctoral research started in 10.2019 the work of building level population density estimation is still ongoing.
Triggered by the need for methods from the field of artificial intelligence (AI) in the project, the project group has become a leading group with regard to the application of deep learning in Earth observation and has started to define the corresponding state-of-the-art in the remote sensing community. By the end of the project, the group will have consolidated this position, and several innovative algorithms fine-tuned to both semantic (2D) and topographic (3D/4D) urban analysis will have been developed and made available to the public in the form of open access publications.
Mapping Global Urban Morphology from Space by Deep Learning
First Impression of the Global 3D Urban Models from Earth Observation Data Science
So2Sat LCZ42: A Benchmark Dataset for Global Local Climate Zones Classification
So2Sat POP: An earth observation dataset for population density estimation
The coverage of our Global Building Footprint (G), comparing to Google (R) and OSM (B)
First Impression of the So2Sat Global Urban Models (3D + Semantics)
My booklet 0 0