PROTEUS follows an iterative, scenario-guided approach, through incremental stages. Initially, the project detailed the requirements, objectives, as well as the Data Management Plan. This was achieved by a complete catalogue of functional (types of data, type of operations, type of queries, etc.) and non-functional requirements (response time, effectiveness, etc.) derived of the industrial scenario, and a Data Management plan to manage training data and real data during the evaluation tests. Then the setup of the validation scenario was prepared, including benchmark and KPIs. This resulted on detailed description of how to execute the new technology in the industrial system. Finally, by the time of this report, a first prototype of the system was put into testing. An integrated processing engine for analysing data-at-rest and data-in-motion in a hybrid-merge way within Apache Flink data platform was deployed as the basis, and a prototype version of the stream analytics library was integrated on top, enabling operations (moments, heavy hitters, event detection, subsampling, semiring statistics) providing actionable insights for decision making.
PROTEUS follows an Open Source policy to release core outcomes, using GitHub as the code repository, open to the community (
https://github.com/PROTEUS-H2020(si apre in una nuova finestra)). The key available results include:
A) PROTEUS Engine: an overhauled version of Apache Flink supporting hybrid computation on batch datasets and data streams.
B) PROTEUS Language: a declarative language library for Scalable Data Analysis.
C) PEACH - Proteus Elastic Cache: the PROTEUS Elastic distributed caché.
D) PROTEUS Incremental Analytics: a backend module that implements incremental version (~O(1)) computational cost using approximations) of most common analytics operations.
E) SOLMA - Scalable Online Machine Learning and Data Mining Algorithms: a scalable library adapted to the data analytics platform Apache Flink consisting of efficient distributed online algorithms for basic utilities, sketches as well as advanced online predictive analytics.
F) PROTEIC.JS: an HTML5 and CSS3 charts library, ready for batch and streaming data.