Periodic Reporting for period 2 - DataCloud (ENABLING THE BIG DATA PIPELINE LIFECYCLE ON THE COMPUTING CONTINUUM)
Periodo di rendicontazione: 2022-01-01 al 2023-12-31
DataCloud created a novel paradigm for Big Data pipeline processing over heterogeneous resources encompassing the Cloud/Edge/Fog Computing Continuum, covering the complete lifecycle of managing Big Data pipelines. DataCloud makes this paradigm easily accessible to a wide set of large and small organizations that encounter difficulties in capitalizing on Big Data. From a technical perspective, the goal is to develop a software toolbox comprising of new languages, methods, infrastructures, and software prototypes for discovering, simulating, deploying, and adapting Big Data pipelines on heterogeneous and untrusted resources in a manner that makes execution of Big Data pipelines traceable, trustable, manageable, analyzable, and optimizable.
To demonstrate the usability and usefulness of the toolbox, the project implemented and deployed five new Business Cases that make use of all DataCloud tools. The pipelines cover a variety of tasks from digital marketing, live media streaming, electronic healthcare, manufacturing and Industry 4.0. Each business case specified, implemented and deployed one or more Big Data pipelines that were incorporated in partners’ heterogeneous technical infrastructures to produce business value. The pipelines were implemented through a collaboration between domain experts, data engineers and DataOps specialists, thus demonstrating the ability of the toolbox to support a wide range of stakeholders. Specifically, SMARK developed and implemented a data pipeline for digital marketing, validated tools for data exploitation, and disseminated results via social media and events, focusing on internal usage for marketing campaigns. MOGSPORTS fully integrated its sports analytics tools with DataCloud, validated through focus groups and pilots at football matches, and outlined an exploitation plan in communications and dissemination efforts. TLUHEALTH advanced remote patient monitoring, validated DataCloud tools through pilots with real customers, leading to commercial contracts, and contributed to scientific publications on data pipelines for patient monitoring. P-DICE improved manufacturing production planning through process mining and cloud computing, validated by stakeholders including plant and production managers, and shared results through dissemination activities. AMANS completed toolkit deployment for welding processes, validated through internal and market assessments, and significantly contributed to scientific conferences and publications, highlighting data science solutions in manufacturing.
DataCloud‘s engagement with the wider community has been implemented through a number of channels, including participation in physical events, online presence (video presentations an interviews, blog posts, news articles, press releases, social media posts, etc.), project collaborations and industrial organization participations. The project performed advertising of project results with industrial community through participation in four industrial organizations. DataCloud also engaged in extensive collaboration with nine H2020 and HEU projects – advertising project results, incorporating technical concepts related to Big Data Pipelines, integrating tools within project technical architecture. In terms of the scientific community, project results, including the project‘s business cases have resulted in more than 60 scientific publications.