European Commission logo
English English
CORDIS - EU research results
CORDIS

Management of Real-time Energy Data

Periodic Reporting for period 1 - MORE (Management of Real-time Energy Data)

Reporting period: 2020-10-01 to 2022-03-31

The energy industry and especially in the Renewable Energy Sources (RES) is one of the sectors which creates huge volumes of sensor data. The technical challenges are staggering: a single wind turbine has 7000 sensors that, ideally, we would like to report in subsecond periods (infeasible now due to data management limitations), while manufacturers monitor thousands of turbines. Moreover, the impact on the economy is huge: the RES industry was estimated at 1,469 billion USD in 2017 and even small increases in the productivity and the competitive advantage of the EU companies in the sector will have a significant impact on EU economy. MORE focuses on creating the software tools for data analytics tailored to the needs of RES stakeholders. The main objectives of MORE are:
Objective 1: Creating a platform that can consume billions of streams and 100s of petabytes of data.
MORE will address this challenge with a twofold strategy: a) edge analytics to locally perform fast simple event detection and b) time series modelling and compression, supported by the edge processing. We plan a system where data will be modelled and summarized in the whole data processing pipeline and analytics will be directly applied to the summaries.
Objective 2: Accurate prediction, forecasting and diagnostics.
MORE will achieve its aim for accuracy in prediction and diagnostics by focusing on incremental machine learning algorithms that can scale better than most existing approaches and are updated continuously. Moreover, it will also work on highly parallelizable pattern extraction methods, by focusing on motif extraction techniques which are developed especially for time series data. By extracting motifs from the collected historical data, we can identify patterns that are linked to important events in the RES installation or reveal properties of the components
Objective 3: Reduce the human effort for building complex learning models. Limited computational resources or lack of sophisticated techniques is not the only obstacle for extracting the desired information from time series data. A significant problem is the human effort required to parametrize a machine learning model. AutoML is a set of automation services that help the end user perform feature extraction, model tuning and other time consuming and complex tasks. However, existing AutoML approaches do not cover IMLA algorithms or ML algorithms for time series.
Objective 4: Have a tangible result on RES management
In MORE we strive not to just beat the benchmarks, but also provide solutions validated by our industry partners. The industry partners in the consortium have identified several important goals for RES data analysis. Their goal setting derives not only from their own efforts in data analysis, but also from several other RES sector stakeholders, who need feedback from monitored data analysis. The patterns and features that the RES industry desires, indicate conditions with important operational and financial impact.
The major focus for the first period was to understand and solve core problems that arise in the context of energy analytics for Renewable Energy Sources. The consortium industry partners provided data from various solar parks and wind turbines and supported different problems and use cases. This process was not one off; until this day the industry partners continue to provide the research partners with new datasets and new annotations on the datasets to enhance our work on accurate analytics.
Working with the vast amounts of data that our industry partners have, revealed several challenges that could not have been completely foreseen. One of them is the variety of the data; each installation produces data that describe a different production mechanism, i.e. each source is unique, due to technological differences, differences in placement or due to age factors, etc. The most important challenge posed by the data variety is the lack of labelled data, i.e. data that have explicitly been associated with different conditions, e.g. soiling. Since each RES is significantly different than the rest, in theory we need labelled data for each. In MORE we have put great effort to overcome this obstacle and to be able to extract labels, for important problems, e.g. soiling, in a semi-automatic way from raw monitoring data.
The consortium developed the architecture of the platform, which defines an edge-cloud model for analytics, prediction and forecasting models that fully exploits the compression offered by ModelarDB. The architecture supports lightweight analytics on the edge and allows transferring large amounts of data to the cloud for resource-intensive analytics.
Algorithms have been designed and implemented to support distribution and scalability. They support parallelization and are built using state of the art parallelization. Scalability enhancements and evaluation of the algorithms’ capabilities will be the focus of upcoming work, as the main weight of the work is shifted from creating the suitable tools (accurate models for prediction, forecasting and diagnostics) that provide the desired functionality for each solar park or turbine, to adding the scalability capabilities that will offer this functionality over vast amounts of different solar and wind parks simultaneously.
The work in the first half of the project has progressed as prescribed in the work plan, and the initial version of all the components are ready. This includes the model-based compression and storage, edge and cloud data ingestion, storage, and query engine, and the edge and cloud analytics, the Incremental Machine Learning algorithms, and the AutoML, the pattern extraction tools, the complex event detection and the visual analytics module..
In the second period, MORE will build on the tangible outcomes of the first period (libraries, tools, evaluation results) in order to produce an integrated framework of methods and tools for the scalable and accurate analytics of RES data. Particular emphasis will be given on scalability, ensuring that the implemented data mining and machine learning methods can be efficiently executed in distributed settings, on tens of thousands of RES modules. Another important expected result will be the integration of the already developed components that will allow: (a) the full exploitation of the ModelarDB capabilities in compression, storage and querying by the analytics modules and (b) the interactive visualization of datasets and analytics results to the end users, which will leverage the value of our methods in handling real-world use cases. Finally, we will target at the adaptation, extension and generalization of the implemented machine learning and pattern extraction/detection methods, to effectively handle all the prescribed use cases. This, in conjunction with the extension and enhancement of the visualization utilities, as well as the stakeholder feedback that will be elicited from the pilots, will ensure that the MORE framework, in its final version, will be able to perform meaningful analytics on RES data, accurately solve problems currently unhandled, and produce real value (e.g. in terms of increased production, or optimized maintenance procedures) to RES stakeholders. MORE envisions that by developing and demonstrating the value of its novel, data-driven techniques will have a large socio-economic impact on the field of RES.
MORE Architecture