European Commission logo
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

A Planetary Inventory of Life – a New Synthesis Built on Big Data Combined with Novel Statistical Methods

Periodic Reporting for period 3 - LIFEPLAN (A Planetary Inventory of Life – a New Synthesis Built on Big Data Combined with Novel Statistical Methods)

Okres sprawozdawczy: 2022-07-01 do 2023-12-31

While humanity is planning for conquering other planets, we still know little about how life is structured across our own. This is a chilling state of affairs. Accounting for biodiversity patterns is the basis for any sustainable management of natural resources in the face of ongoing global change. Our current ignorance concerns not only global species richness, but also how these species are structured into communities and how they interact with each other. Current syntheses have typically been compiled from mixed data points gathered by different approaches. Our current understanding of biodiversity tends to derive from studies examining a fraction of the overall diversity. What we think we know about biodiversity is oftentimes based on the least diverse parts of biological life.

There are two main reasons why we understand biodiversity and its drivers so poorly. First, we lack the relevant data, since for the vast majority of species we have either no data or very sporadic data. Second, the processes underlying biodiversity dynamics are complex, and we lack the tools for converting the data that we do have into a true understanding of the processes behind them.

Through LIFEPLAN, we will overcome both hurdles. We bring together the key expertise needed to generate and interpret Big Ecological Data for a global synthesis of biotic patterning across our planet uniting community ecology, methods for automated species recognition, and Bayesian statistics for immense data.

As a basis for the whole LIFEPLAN venture, we will generate a well-standardized global dataset for a substantial proportion of all species. Such standardization is achieved through semi-automated methods, producing comparable data independent of the exact expertise of the person or team conducting the sampling. Based on a recent revolution in sampling methodology, such a sampling design is now finally achievable.
Since starting LIFEPLAN, we have built a network of 104 volunteer teams around the world. These teams conduct LIFEPLAN sampling on a global scale, using camera traps, audio recorders, insect traps, fungal spore samplers and soil sampling. We have purchased and sent equipment to the teams, and assisted them with acquiring permits and complying with the relevant international legislation. We have created a mobile application and a website for keeping track of tens of thousands of samples, and a cloud-based system for digital data transfer over the internet. We have also started our own sampling using the same methods on a finer spatial scale, at an additional 68 locations in Sweden and Madagascar.

To identify the species in the samples, images and sounds that are being collected, we have developed machine learning models. The massive amounts of data we are collecting make it impossible for individual human experts to go through it all, which is why we need machine learning methods. We have also set up websites for collecting the training data that will be required for the identification task. The training data will be sound, image, and DNA barcode libraries of known species.

A major challenge is that we are discovering many new species. We have addressed this challenge by developing a classification approach that uses probabilities to represent uncertainty in classification and taxa discovery. We have also developed a new approach for predicting the number of new taxa that would be discovered if a given number of additional samples were processed - providing valuable information for the design of sampling and prediction of biodiversity. This approach also adds to the statistical literature on species sampling models, relevant to very broad applications beyond ecology.

Beyond collecting our own data and analyzing it, a major part of LIFEPLAN is developing new methods for big data statistics. We have developed multiple new modelling frameworks that can flexibly adapt to the types of structure common in spatial ecology data, as well as many other applications. We have produced multiple algorithms for more efficient computation in modelling of large spatial data – these algorithms can handle broad data types and models. We have developed two new classes of algorithms to enable much faster Bayesian statistical analyses of very long time series data, while maintaining theoretical guarantees on accuracy of the approximations employed.
Setting up the global LIFEPLAN sampling infrastructure is a breakthrough in advancing biodiversity research significantly beyond the state of the art. This type of systematic and taxonomically comprehensive data on major organism groups such as arthropods and fungi were not previously available. We expect these data to make a major contribution not only through our own primary scientific publications to follow, but also by making those data openly available.

A number of the statistical developments are also significantly beyond the state of the art in the field. Particularly notable are new structured frameworks for factor analysis, new paradigms for Gaussian process modelling with the input domain having unknown restrictions, and a new framework for species sampling modelling, improving upon the broad Bayesian nonparametric statistical literature on this topic. In addition, our articles on scalable algorithms for spatial and temporal data are significantly beyond the state of the art.

As both the data collection and the statistical work continue, we will next be able to apply the new methods we develop to our new data. We expect this to lead to a transformative new understanding of life on earth as we put together new models of how species are distributed across the globe.
LIFEPLAN in relation to the current state of biodiversity knowledge
Map of LIFEPLAN sampling sites