Periodic Reporting for period 1 - SMARTER (A Scalable and Elastic Platform for Near-Realtime Analytics for The Graph of Everything)
Période du rapport: 2016-04-20 au 2018-04-19
For the dealing with fast update streams in conjunction with massive volume of data, the project investigated on how to create a native and adaptive solution for storing and processing data of the Graph of Everything on the cloud infrastructure. Whereas conventional relational database infrastructures are crumbling under the volume of this data with the tight constraints on schemata, the SMARTER relied on a novel graph-based computing paradigm to a massively parallel processing architecture power by relying on elastic computing platforms, e.g Apache Flink. Furthermore, with the RDF-based model with places no conditions on the structure of the data that can be processed, the project's solution supports the pre-computing of slow-updated data for continuous queries over highly-update data from streams which proved a huge performance gain in the extensive experiments on the standalone environment . Velocity reflects the need for “just-in-time” processing of dynamic stream data i.e. incoming stream data flows into processing pipelines that must be matched to patterns in the static data.
To answer to this need, this project carried out studies to design algorithms and provide a solution based on RDF data model for leveraging provide graph as the basic representation for stream data while meeting the scalable and low-latency processing requirements on big volume and highly dynamic data. The Graph of Everything is the graph representation of data generated from Internet of Everything that is referred as an extension of the Internet of Things (IoT), where smart devices, personal clouds, wearable technology, big data, and networking are all inter-connected to each other via the internet. The solution provides a declarative mechanism to filter, aggregate, enrich, and analyze a high throughput of data from multiple disparate live data sources and in any data format to identify simple and complex patterns to visualize business in real-time, detect urgent situations, and automate immediate actions. In SMARTER, the unified view of all the heterogeneous data sources provided by RDF data model facilitates the transparent and cost-effective integration of analytical computing components based on graph-based data model. For dealing with heterogeneity of dynamic data sources, the project proposed an approach for transforming the data in a variety of data formats and data sources to make it ready to any further processing and high-level analytical operations. For scaling, the project investigated on a workflow-based solution coupled with a declarative continuous query language over Linked Stream Data so that the process of creating a data analytic pipeline can be done interactively and visually. Along with contribution of elastically scaling the continuous analytical processing the cloud infrastructure, the project also will aim to make new contribution on adaptive optimisation of multiple continuous queries deployed in distributed stream processing instances