As on the Web, access to and integration of information from large numbers of heterogeneous streaming sources under diverse ownership and control is a resource-intensive and cumbersome task without proper support. Such streaming data sources generated from data acquisition infrastructures of Smart Cities, Social network application, medical sensors, etc are still dominated by static data silos, large warehousing systems that are often unmanageable and rudimentary user interaction interfaces which trade intuitiveness for completeness. The effective exploitation of continuous data streams from multiple sources requires an infrastructure that supports the intense effort of enrichment, linkage, and correlation of data stream with very large static data collections while at the same time combining the result into increasingly complex data objects representative of realistic models of the world. When these data objects are represented as graphs or networks (represented as sets of triples), complex processing pipelines can be described and implemented by combining the expressive power of declarative queries and high-level data-parallel programming paradigms which also rely on elastic execution leveraging a cloud infrastructure. Linked Stream Data employs Linked Data Model to provide graph as the basic representation for stream data.
To answer to this need, this project carried out studies to design algorithms and provide a solution based on RDF data model for leveraging provide graph as the basic representation for stream data while meeting the scalable and low-latency processing requirements on big volume and highly dynamic data. The Graph of Everything is the graph representation of data generated from Internet of Everything that is referred as an extension of the Internet of Things (IoT), where smart devices, personal clouds, wearable technology, big data, and networking are all inter-connected to each other via the internet. The solution provides a declarative mechanism to filter, aggregate, enrich, and analyze a high throughput of data from multiple disparate live data sources and in any data format to identify simple and complex patterns to visualize business in real-time, detect urgent situations, and automate immediate actions. In SMARTER, the unified view of all the heterogeneous data sources provided by RDF data model facilitates the transparent and cost-effective integration of analytical computing components based on graph-based data model. For dealing with heterogeneity of dynamic data sources, the project proposed an approach for transforming the data in a variety of data formats and data sources to make it ready to any further processing and high-level analytical operations. For scaling, the project investigated on a workflow-based solution coupled with a declarative continuous query language over Linked Stream Data so that the process of creating a data analytic pipeline can be done interactively and visually. Along with contribution of elastically scaling the continuous analytical processing the cloud infrastructure, the project also will aim to make new contribution on adaptive optimisation of multiple continuous queries deployed in distributed stream processing instances