Periodic Reporting for period 1 - CyclOps (Automated end-to-end data life cycle management for FAIR data integration, processing and re-use)
Período documentado: 2024-01-01 hasta 2025-06-30
CyclOps aims to facilitate the adoption and production of data, models, and services from and for data spaces. By combining semantic technologies, knowledge graphs and human-in-the-loop approaches, CyclOps provides an end-to-end platform that covers all stages of the data lifecycle: discovery, ingestion, processing, governance, interoperability, and sharing with data spaces.
The specific objectives are: 1) To design and develop a trustworthy automated platform for heterogeneous data management and analytics; 2) to adopt federated and privacy-preserving AI techniques, enabling reuse and transfer of models; 3) to provide protocols and semantic governance mechanisms to enforce data rights, FAIR principles and GDPR; 4) to validate the solution in four real-world use cases (Tourism, Green Deal/Climate, Public Procurement, Manufacturing), ensuring relevance and scalability and 5) to engage with European data spaces and contribute to standards, dissemination, training, and exploitation pathways.
CyclOps will enable organisations to seamlessly provide, cross and analyze machine- and human-generated data from and for data spaces, facilitating the creation of new AI-based applications, value-added services, and scientific knowledge.
Setting the landscape (WP2): The foundational cornerstones for the CyclOps platform were established by analysing the baseline technologies, defining use cases and collecting an initial set of requirements for the platform. With that, the first iteration of the functional architecture was designed, featuring a modular, four-layer structure (User Intent, Runtime, Knowledge, and Interoperability).
User Intent Layer (WP4): It was designed a user-assisted intent interface, aiding the development of data processing pipelines through intuitive human-machine interaction, and a first version implemented. This includes the Natural language intent interface (allowing users to express intents in plain language), the User-assisted intent interface (visual interface to define analytical intentions), the Exploratory Analysis (enables interactive data exploration), and the Pipeline Orchestrator (translates structured user intents into execution pipelines).
Knowledge Layer (WP4): This layer operationalises and automates data management throughout the lifecycle, relying on an ontology-based approach and semantic technologies to provide an Integrated Knowledge Base (IKB). The first version was developed including IKB Design tools for the semi-automatic generation of mappings and ontologies, a User Interface tool for further editing and testing mappings and ontologies, IKB Querying components, IKB Exploitation services (Concept Extraction and Categorisation Engine, NLU module for entity recognition/classification, and a Multimodal Explainability module for AI), a Metadata Manager service and a GDPR-compliant framework, defining annotation of processes and data handling steps within the IKB.
Runtime Layer (WP3): This layer is the execution core of CyclOps, The first implementations of its three main components were delivered. In DataOps, progress included data ingestion functions, preprocessing and cleaning and discovery tools, Long-Term Storage and a Data Augmentation tool. In AIOps, repositories and services were developed to operationalise AI workflows: the CyclOps Lab (decentralised algorithm repository), the AI Models Repository and the Optimization Functions Repository. Additional services such as Feature Engineering and an AI Marketplace were also developed. In DEA, integration of dataClay for distributed data management and PyCOMPSs/dislib for large-scale execution enables transparent pipeline execution across heterogeneous infrastructures.
Interoperability Layer (WP5): This layer enables CyclOps to connect, exchange, and publish data, models, and services with external data spaces. The first integrated release focused on semantic interoperability, secure data exchange, and decentralised identity. A semantic interoperability service was deployed to validate datasets against NGSI-LD and Smart Data Models. For data exchange, a Context Broker was integrated, supported by agents that transform diverse formats into NGSI-LD entities. Trust and access control were reinforced with decentralised identity mechanisms using Verifiable Credentials and DIDs, supporting both person-to-machine and machine-to-machine authentication.
Lab testing and Platform integration (WP5): CyclOps carried out lab testing activities (combining unit tests, integration tests, and performance benchmarking). Effort was also done in the integration of the platform. A Continuous Integration/Continuous Deployment pipeline was set up, enabling automated builds, version tagging, and deployment of components in isolated containers. Integration workflows were also defined, leading to the first CyclOps Platform Release 1.
Use cases (WP6)
The advances concentrated on the design and preparation of the four use cases for their implementation in the next project phase. The work involved refining user requirements, mapping data sources, identifying stakeholders, and aligning the use case objectives with the CyclOps components. Work also was advanced in implementing specific use case developments and setting up the infrastructure.