Periodic Reporting for period 1 - DeepGeo (Deep Gaussian Processes for Geostatistical Data Analysis)
Reporting period: 2018-07-23 to 2020-07-22
During the project, we showed how we can push current state-of-the-art methods to their fullest. We also developed new methods that have a better understanding of when they are uncertain about a prediction, as well as methods that would allow us to discover new types of pollution by taking advantage of correlations with known types of pollution. Unfortunately, we also discovered that it is not possible to model soil pollution accurately with the quality of the soil samples that are being collected today due to regulatory requirements. The samples are taken too far apart, leading to blind spots between them, meaning that we potentially miss pollution hotspots.
We also discovered that most AI methods assume that the input we give them is precisely known. This is not always the case - for instance, we may not know exactly where on an old industrial site a soil sample was collected - and not taking this uncertainty into account can lead to overly confident predictions of pollution. By developing a new way to include this uncertainty, we created a new method that has a much better understanding of how confident it should be about a certain prediction of pollution.
We then looked at how we could get an AI system to not only learn to model the pollution by many different chemicals at once but also how the amount of these chemicals relate to each other. By combining two popular AI methods, we developed a new hybrid method that makes it possible to predict the amount of a chemical much more accurately if other chemicals have been measured nearby.
Finally, we discovered that predicting soil pollution using AI or any other statistical method is not possible with today's data on soil pollution. This is because soil samples are collected too far apart at polluted sites, leading to blind spots between the samples that could, potentially, be heavily polluted. The required distance between soil samples are determined by law, so we will draft new proposals for these regulations and inform policymakers about the issue.
Our work on including uncertainty in AI methods is also very general and can be used in many different situations. When it comes to pollution, this means that methods will have a much better understanding of how confident they should be about their predictions. We can, therefore, avoid having methods that claim that there is only very little pollution, when in fact there is a lot, just because they did not consider how precisely we knew their inputs.
Finally, we developed a new method that can not only model many different types of pollution at once, it can also find much more complicated correlations between these than previous methods, and take advantage of these correlations. For example, we can train the method to find correlations between previously measured pollution and new types of pollution. We can then predict how much of the new pollution there is anywhere we have measurements of the old type without having to go out and obtain new, expensive samples.
We also discovered, unfortunately, that it is not possible to use AI or any other statistical method to predict soil pollution based on the samples we are collecting today. We are simply taking soil samples too far apart, which has large societal implications, as we may miss hotspots of pollution. By drafting new guidelines for soil sampling based on the findings in this project, we hope to turn the policymakers' attention to this problem.
It may not be possible to use AI to predict soil pollution because of the current regulations, but the methods we developed are completely general and can be used for many other types of data. For instance, they can be used to predict air pollution or to help in modelling climate change.