Periodic Reporting for period 1 - DataCloud (ENABLING THE BIG DATA PIPELINE LIFECYCLE ON THE COMPUTING CONTINUUM)
Période du rapport: 2021-01-01 au 2021-12-31
DataCloud aims to create of a novel paradigm for Big Data pipeline processing over heterogeneous resources encompassing the Cloud/Edge/Fog Computing Continuum, covering the complete lifecycle of managing Big Data pipelines. DataCloud aims to make this paradigm easily accessible to a wide set of large and small organizations that encounter difficulties in capitalizing on Big Data due to the lack of suitable processing capabilities. From a technical perspective, the goal is to develop a software toolbox–the DataCloud toolbox–comprising of new languages, methods, infrastructures, and software prototypes for discovering, simulating, deploying, and adapting Big Data pipelines on heterogeneous and untrusted resources in a manner that makes execution of Big Data pipelines traceable, trustable, manageable, analyzable, and optimizable.
Apart from the technical results, the project has advanced the design of five Business Cases to demonstrate the use of the DataCloud toolbox. The project has delivered a detailed specification technical specification of the Business Cases (including the market requirements) and details about how the toolbox will be used to implement them. Furthermore, the project has produced a detailed definition of the Big Data pipelines that will be developed for each business case and has identified and started the implementation of the core components and services that are involved. The five Business Cases include: 1) Smart Mobile Marketing Campaigns (SMARK), 2) Automatic Live Sports Content Annotation (MOGSPORTS), 3) Digital Health System (TLUHEALTH), 4) Products Development in Ceramic Engineering (P-DICE), and 5) Analytics of Manufacturing Assets (AMANS).
Expected results until the end of the project include six tools: DIS-PIPE (discovery of the structure of Big Data pipelines from data sources); DEF-PIPE (textual and graphical description of Big Data pipelines); SIM-PIPE (simulation of container-based Big Data pipelines); R-MARKET (provisioning resources from the Computing Continuum); DEP-PIPE (adaptive, secure and scalable orchestration of data pipelines); ADA-PIPE (pipeline scheduling and adaptation). These tools will be combined in the DataCloud toolbox, which will be validated in five business products and services: SMARK (smart data pipeline implementation for mobile digital marketing campaigns management); MOGSPORT (platform for automatic metadata enrichment in live sports events); P-DICE (framework for discovering production data pipelines in the sanitary-ware industry); TLUHEALTH (Telecare/Telehealth services provided as SaaS); AMANS (analytics of manufacturing assets).
The potential impact is targeted on various groups: Data/ICT Industries (time/cost saved in using Big Data pipelines, easier Big Data pipeline lifecycle management for relevant stakeholders, optimization of Big Data processing); Data Scientists (seamless use of the Computing Continuum infrastructure for deploying Big Data pipelines); Business experts (possibility to get involved in the process of definition, simulation, and deployment of Big Data pipelines); DevOps/DataOps (increased productivity and quality of system deployment and maintenance); Resource providers (novel ways to monetize their resources in a resource marketplace), Policy makers (more effective decision-making procedures based on cross-sectorial Big Data and heterogeneous infrastructures), Entrepreneurs (increased business opportunities related to innovative services and apps), Society at large (advancing research and applying innovative technologies that take the best of breed from the Big Data and Computing Continuum domains).