Periodic Reporting for period 1 - MORE (Management of Real-time Energy Data)
Période du rapport: 2020-10-01 au 2022-03-31
Objective 1: Creating a platform that can consume billions of streams and 100s of petabytes of data.
MORE will address this challenge with a twofold strategy: a) edge analytics to locally perform fast simple event detection and b) time series modelling and compression, supported by the edge processing. We plan a system where data will be modelled and summarized in the whole data processing pipeline and analytics will be directly applied to the summaries.
Objective 2: Accurate prediction, forecasting and diagnostics.
MORE will achieve its aim for accuracy in prediction and diagnostics by focusing on incremental machine learning algorithms that can scale better than most existing approaches and are updated continuously. Moreover, it will also work on highly parallelizable pattern extraction methods, by focusing on motif extraction techniques which are developed especially for time series data. By extracting motifs from the collected historical data, we can identify patterns that are linked to important events in the RES installation or reveal properties of the components
Objective 3: Reduce the human effort for building complex learning models. Limited computational resources or lack of sophisticated techniques is not the only obstacle for extracting the desired information from time series data. A significant problem is the human effort required to parametrize a machine learning model. AutoML is a set of automation services that help the end user perform feature extraction, model tuning and other time consuming and complex tasks. However, existing AutoML approaches do not cover IMLA algorithms or ML algorithms for time series.
Objective 4: Have a tangible result on RES management
In MORE we strive not to just beat the benchmarks, but also provide solutions validated by our industry partners. The industry partners in the consortium have identified several important goals for RES data analysis. Their goal setting derives not only from their own efforts in data analysis, but also from several other RES sector stakeholders, who need feedback from monitored data analysis. The patterns and features that the RES industry desires, indicate conditions with important operational and financial impact.
Working with the vast amounts of data that our industry partners have, revealed several challenges that could not have been completely foreseen. One of them is the variety of the data; each installation produces data that describe a different production mechanism, i.e. each source is unique, due to technological differences, differences in placement or due to age factors, etc. The most important challenge posed by the data variety is the lack of labelled data, i.e. data that have explicitly been associated with different conditions, e.g. soiling. Since each RES is significantly different than the rest, in theory we need labelled data for each. In MORE we have put great effort to overcome this obstacle and to be able to extract labels, for important problems, e.g. soiling, in a semi-automatic way from raw monitoring data.
The consortium developed the architecture of the platform, which defines an edge-cloud model for analytics, prediction and forecasting models that fully exploits the compression offered by ModelarDB. The architecture supports lightweight analytics on the edge and allows transferring large amounts of data to the cloud for resource-intensive analytics.
Algorithms have been designed and implemented to support distribution and scalability. They support parallelization and are built using state of the art parallelization. Scalability enhancements and evaluation of the algorithms’ capabilities will be the focus of upcoming work, as the main weight of the work is shifted from creating the suitable tools (accurate models for prediction, forecasting and diagnostics) that provide the desired functionality for each solar park or turbine, to adding the scalability capabilities that will offer this functionality over vast amounts of different solar and wind parks simultaneously.
The work in the first half of the project has progressed as prescribed in the work plan, and the initial version of all the components are ready. This includes the model-based compression and storage, edge and cloud data ingestion, storage, and query engine, and the edge and cloud analytics, the Incremental Machine Learning algorithms, and the AutoML, the pattern extraction tools, the complex event detection and the visual analytics module..