At the end of the project, DataCloud delivers the final implementation of the DataCloud toolbox that covers both design- and run-time aspects of Big Data pipelines deployment. The toolbox features have been developed considering requirements from five different and diverse Business Cases, a study of the state of the art, as well as the validation feedback from project partners and external users as well as the first reporting period project review. The tools are designed to be used either as an integrated set of components using the DataCloud integrated UI, or separately, as stand-alone tools supporting well-established standards and technologies. The toolbox is comprised of six different tools: DIS-PIPE for pipeline discovery and conformance checking; DEF-PIPE for data pipeline specification and parametrization; SIM-PIPE for pipeline testing, validation, simulation and configuration before deployment; R-MARKET for resource provisioning of trusted and untrusted resources; DEP-PIPE for Big Data pipeline deployment and ADA-PIPE for pipeline scheduling and run-time adaptation. In addition, the project delivered a domain-specific language that extends traditional data workflow specification standards to include additional features needed for both design- and run-time support for Big Data pipelines.
To demonstrate the usability and usefulness of the toolbox, the project implemented and deployed five new Business Cases that make use of all DataCloud tools. The pipelines cover a variety of tasks from digital marketing, live media streaming, electronic healthcare, manufacturing and Industry 4.0. Each business case specified, implemented and deployed one or more Big Data pipelines that were incorporated in partners’ heterogeneous technical infrastructures to produce business value. The pipelines were implemented through a collaboration between domain experts, data engineers and DataOps specialists, thus demonstrating the ability of the toolbox to support a wide range of stakeholders. Specifically, SMARK developed and implemented a data pipeline for digital marketing, validated tools for data exploitation, and disseminated results via social media and events, focusing on internal usage for marketing campaigns. MOGSPORTS fully integrated its sports analytics tools with DataCloud, validated through focus groups and pilots at football matches, and outlined an exploitation plan in communications and dissemination efforts. TLUHEALTH advanced remote patient monitoring, validated DataCloud tools through pilots with real customers, leading to commercial contracts, and contributed to scientific publications on data pipelines for patient monitoring. P-DICE improved manufacturing production planning through process mining and cloud computing, validated by stakeholders including plant and production managers, and shared results through dissemination activities. AMANS completed toolkit deployment for welding processes, validated through internal and market assessments, and significantly contributed to scientific conferences and publications, highlighting data science solutions in manufacturing.
DataCloud‘s engagement with the wider community has been implemented through a number of channels, including participation in physical events, online presence (video presentations an interviews, blog posts, news articles, press releases, social media posts, etc.), project collaborations and industrial organization participations. The project performed advertising of project results with industrial community through participation in four industrial organizations. DataCloud also engaged in extensive collaboration with nine H2020 and HEU projects – advertising project results, incorporating technical concepts related to Big Data Pipelines, integrating tools within project technical architecture. In terms of the scientific community, project results, including the project‘s business cases have resulted in more than 60 scientific publications.