Skip to main content
European Commission logo print header
Content archived on 2022-12-23

Environmental data mining: learning algorithms and statistical tools for monitoring and forecasting

Objective

To support the ongoing effort to develop indicators for environmentally sustainable development, there is a real need for research to enhance the development of technologies, which contribute to the maintenance of environmental quality (water, air, soil). The first step of such a research program consists in collecting and analysing data to provide useful tools for environmental monitoring and forecasting. Such tools would be also helpful for pollution prevention and compliance with environmental laws. Furthermore, if properly managed, they can be applied in environmental protection, for public information and lower operational costs in industry.

To build these needed tools, thanks to low cost sensor and to the new generation of SCADA (supervision, control and data acquisition) systems widely used for environmental monitoring, a lot of raw environmental data is now available in large databases. A usual approach to use this data consists in building physical models and to fit them with available data. But due to the nature of environmental phenomena (noise, non linearity, non stationarity, missing data) the data often does not fulfil the underlying hypothesis of these physical models. An alternative approach consists in relying on the data to build models following the statistical learning theory (SLT) principle. Based on such a principle our research project aims at providing tools for the analysis of these databases to retrieve models, which is called "environmental data mining". It has to be noticed that one important particularity of environmental data regarding usual data mining techniques lies in its spatio-temporal structure. The proposal objective is to develop adaptive methodology and tools to tackle this specific problem, starting from previous work in the fields of artificial intelligence learning theory, statistics, geostatistics and time series analysis.

In such cases solution of prediction and modelling problems requires innovative technology and multi disciplinary approach, which are realised in data mining and knowledge discovery. Knowledge discovery in databases is the process of identifying valid, novel, potentially useful, and ultimately understandable structure in data. The global problems to be solved by data mining are to explain the data, to make predictions about data, to summarise a large database to facilitate decision-making. The main objectives of the proposal are to develop environmental data mining methodology for structuring and building a framework of environmental data processing; to develop new function estimation and classification algorithms (identification and prediction); to develop and adapt methods for detection, analysis, modelling and predictions of extreme events in spatio-temporal processes; to develop tools for image and shape analysis of descriptive input data and data interpolation and simulation.

The scientific method used in this project consists in starting by solving real environmental problems (case studies in water pollution analysis, air quality forecasting and risk assessment) and then to compare results and methodology to generalise. The following research activities will be carried out within the project: analysis and modelling of long and short term time events, identification and monitoring of extreme events. The following tools would be used: support vector machines for monoclass and multiclass problems, support vector regression, artificial neural networks, multi-scale kernel approach, geostatistics and stochastic simulation. The central question in SLT of model selection will be addressed for this specific data using the Bayesian framework.
The first expected results are the case studies demonstrating how to solve particular problems. Based on these case studies, further results will be a demonstration of a new guideline and methodology for environmental data mining, plus free software. So development of new data based theoretical models for environmental analysis, prediction, and extreme event detection are also expected. At the end of this project educational tools (methodology + case studies + new algorithms + software) would be also provided to illustrate the proposed approach for environmental data mining.

Call for proposal

Data not available

Funding Scheme

Data not available

Coordinator

Institut National des Sciences Appliquées
EU contribution
No data
Address
Place Emile Blondel
76131 Mont Saint Aignan
France

See on map

Total cost
No data

Participants (4)