Skip to main content
Przejdź do strony domowej Komisji Europejskiej (odnośnik otworzy się w nowym oknie)
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

SEcure Decentralised Intelligent Data MARKetplace

Periodic Reporting for period 1 - SEDIMARK (SEcure Decentralised Intelligent Data MARKetplace)

Okres sprawozdawczy: 2022-10-01 do 2024-03-31

Context:
The EU data economy has grown tremendously, with forecasts predicting to reach 800 billion Euros in 2025. Data are becoming the new currency, being exchanged as products or services in marketplaces. Data markets are predicted to reach a size of 100 billion Euros in 2025.
Existing data marketplaces are centralised, store the data on the cloud, provide limited to no guarantees about data quality and they are governed by single entities that make the rules. SEDIMARK aims to build a secure, trusted and intelligent decentralised data and services marketplace, based on Distributed Ledger Technology and Artificial Intelligence. SEDIMARK enables distributed heterogeneous data within the EU to be easily and seamlessly linked, shared and exploited for diverse business and research scenarios. SEDIMARK builds upon the concept of FAIR data, ensuring that data are of the highest quality, unbiased, enriched and annotated, so that they can be discovered, accessed, and easily reused. SEDIMARK includes a distributed registry of resources (data/services) stored on edge systems, close to where they are generated and where the data are cleaned, labelled, validated and anonymised. Security is applied with strong access control, privacy techniques for data minimisation and purpose limitation, exploiting blockchain for enforcing trust, decentralised identities, and data verification. Energy efficient AI techniques will be used for automated data quality management, labelling and classification of data as well as for providing (distributed) analytics and advanced services on top of the data. Semantic interoperability based on common ontologies and data models will allow the easy and efficient discovery, sharing and federation of heterogeneous data from multiple sources. The system is built on top of existing platforms of the consortium, starting from TRL5 and will be tested and demonstrated in four real world scenarios (1.Mobility Digital Twin in Helsinki, 2.Urban bike mobility planning in Santander, 3.Valorisation of energy consumption and customer reactions/complaints in Greece and, 4.Valuation and commercialisation of water data), reaching TRL8
Objectives:
1. A decentralised infrastructure for a data and service marketplace (Decentralisation)
2. A common ontology and a complete AI-based data management toolset, for curating heterogeneous data, enabling their efficient reuse in EU data spaces (Interoperability, Data quality)
3. Secure storage, discovery, quality and confidence ranking of data in distributed platforms (Trust, data integrity)
4. Distributed Green AI techniques as a Service that allow the efficient and secure processing of large amounts of data from remote platforms (Decentralisation, Intelligence)
5. Secure and fair access to data using strong access control, anonymisation, data minimisation exploiting DLT (Trust, DLT)
6. A ready-to-market platform, including data sharing incentivisation and monetisation (Exploitation, Open source)
The main achievements of SEDIMARK project during this period can be summarized as follows:
1.SEDIMARK Reference Architecture: an initial version of the SEDIMARK architecture (deliverable D2.2) following a common methodology which started with the definition of the project use cases and the requirements elicitation process. This first complete version of the SEDIMARK functional architecture describes the main functional and system components of the SEDIMARK decentralised marketplace.
2.SEDIMARK Common Ontology: FIWARE Smart Data Models, such as the Data Quality model, are used to enrich the content of existing datasets with the output of the data processing pipeline. With respect to marketplace offerings, a RDFS/OWL ontology built by integrating well-known ontologies and models (ODRL, DCAT, FoaF, DCT) was proposed to allow for discovery and serving of the different marketplace assets.
3.SEDIMARK data pipeline, which implements a set of data quality metrics, alongside a number of data cleaning tools and modules for orchestrating the pipeline and visualising its results. Work on implementing the data pipeline has already progressed significantly, with existing modules for data profiling, data cleaning (anomaly detection, missing value imputation, deduplication), and data orchestration, as well as data augmentation (data synthesis) and feature engineering (dimension reduction and feature selection).
4.Implementation of a DLT infrastructure used to provide trustworthy, non-repudiable and immutable information about Participants and Offerings. The information stored is validated by a set of decentralised nodes, providing improved resilience. The implemented security mechanisms can maintain asset integrity and asset origin throughout the lifecycle of the asset itself, providing an additional layer of assurance to the asset consumers.
5.SEDIMARK developed two frameworks for distributed training of machine learning models, i.e. for federated and gossip learning. The two frameworks cover different scenarios based on how the training process is initiated and considering the ML expertise of the user of the framework.
6.Seven independent Proof of Concept (PoCs) scenarios have been defined and implemented to support the basic functionalities of SEDIMARK system: i)Participants Onboarding, ii)Data quality improvement, iii)Offering lifecycle, iv)AI-related scenario, v)Asset (data) exchange, vi)User interfaces (GUIs), and vii)Open data enabler. Even though these scenarios operate independently, they are often interconnected, demonstrating the synergy between different modules and components of the system.
SEDIMARK project has achieved advancements beyond the state of the art in various fields.
1.SEDIMARK acknowledges that access to high quality data in required to build new data technologies and infrastructures, so the trustworthy data-sharing ecosystem of SEDIMARK will provide significant tools for creating data spaces within the EU and for interconnecting them, allowing the easy cross-dataspace data exchange and discovery. The decentralised nature of SEDIMARK using DLT as the backbone improves the overall trust. The SEDIMARK open interfaces provide extensibility both with new tools and with new data providers. The provision of distributed AI training frameworks can also convert the marketplace into a trustworthy AI ecosystem, extending the concept of data spaces to not only share data, but also AI models, AI services, results of models and model metadata. Overall, it is expected that SEDIMARK can indeed contribute to a more secure data-agile economy.
2.SEDIMARK by leveraging on NGSI-LD's semantic capabilities and the richness of Smart Data models, ensures meaningful and interoperable annotations for improved comprehension and utilization of data.
3.SEDIMARK offers easy ways to discover rich interoperable heterogeneous data and the ability to jointly train distributed machine learning models, in order to better generalise the models to work on even unknown (to that user) scenarios. This, together with the ability to execute ML models at the edge, allowing faster decision making, can help in crisis scenarios where fast decisions and global knowledge of the situation is required. The option to extend the marketplace with external open datasets will also provide a wider access to datasets from different sectors that can help identify wider patterns and trends related to various societal challenges.
Moja broszura 0 0