Impacts of Climate Extremes from Mining of Online Texts

Informations projet

ICE-MOT

N° de convention de subvention: 101112727

DOI

10.3030/101112727

Projet clôturé

Date de signature de la CE 27 Février 2023

Date de début 1 Avril 2023

Date de fin 30 Septembre 2024

Financé au titre de

European Research Council (ERC)

Coût total

Aucune donnée

Contribution de l’UE

€ 150 000,00

Coordonné par

UPPSALA UNIVERSITET
Sweden

Periodic Reporting for period 1 - ICE-MOT (Impacts of Climate Extremes from Mining of Online Texts)

Période du rapport: 2023-04-01 au 2024-09-30

Climate extremes have multifarious detrimental impacts on human activities and ecosystems, from excess deaths during temperature extremes to hydrological extremes affecting crop yields or resulting in destructive floods to windstorms damaging infrastructure and utilities. Gaining a detailed understanding of these impacts is essential for disaster risk reduction and preparing for extremes in a changing climate. However, current freely-available multi-hazard, multi-impact databases present limitations which hinder further scientific and practical progress. These include incomplete impact information, the need for work-intensive manual updating and the lack of scalability. This project aims to build a state-of-the-art impact database for extreme climate events, based on text-mining of freely-available online textual sources. Online news websites, national weather service or government reports, Wikipedia entries and others can provide a wealth of information on climate impacts. Yet, extracting this information manually is time-consuming. This project instead employs natural language processing techniques to automatically extract information for a global database of impacts from multiple categories of climate extremes. Such an undertaking brings key added value over current freely available multi-hazard multi-impact databases.

During the initial part of the project, the bulk of the work has gone into manual annotation of a series of articles on impacts of extreme events to be used as training and test data for the large language models. This was followed by the development of an automated pipeline for processing textual information. More specifically, the pipeline that we developed leverages large language models and in-context learning to extract semi-structured information from textual sources, which is normalized and refined in post-processing. A crucial step is geoparsing, which maps place names to geographical entities. An empirical evaluation based on the manually annotated benchmark data shows that extraction accuracy varies for different types of impact information, but is generally high.

Further work has focussed on connecting impacts of single instances of extreme events to the corresponding physical hazards. To do so, we first conducted a case-study analysis to understand how sensitive our results could be to the use of different metrics to identify the physical extremes. We have further developed a number of algorithms to automatically detect climate extremes, both in isolation and when multiple extremes co-occur. This work has evidenced the shortcomings of existing impact databases, whose impact entries often do not match hazards as recorded in historical climate data. We have specifically conducted an in-depth analysis of and intercomparison with the EM-DAT data. These results converged into a publicly available one-of-its-kind global database of impacts from multiple categories of climate extremes.

We have also performed work on weakly supervised language processing to derive indirect and cascading impacts. However, this has not yielded robust results with regards to the clustering of the impact categories and the ability of our pipeline to extract quantitative information on these impacts. We have therefore decided not to include these results in the initial version of the database. The scientific challenges encountered in this part of the work have resulted in a contribution to a perspective paper on the dynamics of multi-sector impacts of extremes. The inability to extract indirect and/or cascading impacts was a particularly high-risk high-gain part of the project, and had been identified as a potential implementation risk in the project plan. Consistently with this, the rest of the project was structured so as not to depend on this part of the analysis.

The pipeline that we developed and made publicly available, for automatically processing textual information and compiling the information from the texts into a structured database of impacts of climate extremes, is thus far unique. It provides a clear advantage over existing methodologies to compile climate impact databases in that the resulting database is easily updateable by running new texts through the pipeline. Unlike many existing databases, it also provides the source of each item of impact information, enabling manual verification by users who are interested in specific climatic events. The resulting global database of impacts from multiple categories of climate extremes holds concrete scientific and economic potential. Scientifically, it enables novel studies on climate impacts beyond what has been possible with existing databases. Economically, we have engaged stakeholders and industrial partners who have been actively included in the discussions on the design and user-cases of the database, and have confirmed their concrete interest in using the database. For future commercial applications of updated versions of the database, IPR and commercialisation support will be needed.

Contents and automated pipeline of the database of impacts of climate extremes

Periodic Reporting for period 1 - ICE-MOT (Impacts of Climate Extremes from Mining of Online Texts)

Télécharger Télécharger le contenu de la page