Skip to main content
Przejdź do strony domowej Komisji Europejskiej (odnośnik otworzy się w nowym oknie)
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

DATA Monetization, Interoperability, Trading & Exchange

Periodic Reporting for period 1 - DATAMITE (DATA Monetization, Interoperability, Trading & Exchange)

Okres sprawozdawczy: 2023-01-01 do 2024-06-30

Studies have shown that up to 95% of organisations suffer from the data decision gap. Similarly, reports state that between 60% and 73% of data is never used for analytics or that only 32% of companies can achieve tangible and measurable value from data. The impact is unmeasurable, as companies are losing the opportunity to better understand their customers, make better pricing decisions, or even avoid fraud.
DATAMITE empowers European companies by delivering a modular, open-source, and multi-domain framework to improve DATA Monetizing, Interoperability, Trading and Exchange, in software modules, training, and business materials.
The DATAMITE project develops a simple but impactful technical framework that enables European enterprises and public administrations to overcome existing challenges and facilitate the monetisation of their data. The core objectives consist of helping users to better monetise, govern and enhance the trust of their data by developing a set of key modules: Data Governance, Quality, Security, Sharing & Supporting Tools. Interoperability with current leading storage technologies is achieved by building them on top of existing open-source components.
DATAMITE will validate the results in 3 different use cases with a total of 6 pilots, demonstrating that the framework is interoperable and usable in different domains and user needs, such as 1) Intra-corporate, multi-domain data exchange; 2) Data trading among Data Spaces; 3) Integration with other initiatives as Data Markets, EU AI-on-demand platform, or DIHs. Sectors covered by the pilots are agriculture, energy, industrial and manufacturing, and climate.
To achieve this, the project relies on a consortium of 27 partners from 13 countries, bringing together key actors of the Data Value Chain: Data Spaces technical and business stakeholders, multiple key communities, key experts in Legal and SSH aspects to guarantee legal and societal compliance, and facilitators on open-source community building and standardisation activities to accelerate the transfer to the market.
Technical activities have focused on the design of DATAMITE’s architecture and the design and development of the different modules and their components. Most components have started their development (some will come in a second phase) and several have reached a very advanced state (e.g. metadata repository, data governance backend, data ingestion and storage, discovery connectors, KPI library, Data anonymisation, Logging or Brokers and plugins, to cite some). The approach followed has been to design and develop the components, considering potential consumers and providers, and then proceed to integration. Consequently, frontend tasks have started with some delay as they are mostly consumers of APIs.
The main achievement during this period has been devising DATAMITE’s architecture. The project is quite ambitious in terms of functionality and services to be provided to users, and bringing all together into a single framework while trying to keep it as modular as possible was a complex task. Although all the technical tasks have their complexity, I’d remark on the efforts made in devising the metadata model, which tries to give room to users to enrich data in an inherent way (as we consider it should be) providing the business view through the usage of vocabularies. Along these lines comes the work performed to extend DQV to describe Data Quality information derived from user-defined rules for which we will consider standardisation. Also remarkable is the work performed in the Data Sovereignty component to create tools that facilitate the creation of policies while keeping in mind its enforcement, especially through the use of the EDC connector that will be integrated. Regarding data sharing, the proposed approach of not focusing only on EDC and dataspaces or Gaia-X has proved valid, as publishing data to different portals (e.g. in the several pilots) has gained importance as new possibilities have arisen, not constraining the project to initiatives that may still be in incipient stages.
Although not all components can be yet integrated into a common flow, the number of components successfully interacting already is promising, opening the door for more elaborate results in the second half of the project.
The main contribution of DATAMITE is providing an open-source framework that combines multiple tools that, at the time of the proposal, could only be found separately. Even now, open-source tools have appeared (e.g. OpenMetadata, LinkedIn Datahub) that combine governance and data quality but have not yet explored the combination with data sharing. DATAMITE will not only integrate well-known efforts like the EDC connector to allow for interacting with Gaia-X and DataSpaces but also offer a plugin-based (pull/push) approach to allow users to create their own (custom) connectors to other portals, platforms or markets like it is the case in DATAMITE with the AI-on-Demand platform (already available), EOSC (now being rebuilt), third party open-data portals or other initiatives such as Pontus-X. Thus, we do not condition the success of the sharing mechanisms to the progress of external tools (i.e. EDC connectors, Gaia-X) but enable this often-neglected alternative.
The main result that can be presented is the architecture, which illustrates how DATAMITE improves current alternative open-source approaches. Also, the metadata model, based on DCAT and extending it, can be seen as a relevant contribution, as current metadata vocabularies are mainly thought for data publication into public market-like catalogues but not that much for intra-company cataloguing and exploitation. DATAMITE’s metadata model leverages DCAT and extends it to offer a finer-grain detail, especially towards multi-artifact datasets with different levels of complexity.
Additionally, regarding data quality, DATAMITE is working on extending the DQV protocol/standard for Quality metadata to include user-defined metrics and rules. This extension is quite advanced and denoted DDQV (DATAMITE DQV) and will be proposed to the community. Similarly, we are also working on proposing a series of data quality categories and dimensions as part of this standard, given the lack of a standard approach on this matter.
Finally, project total success will be partially conditioned by the maturity of tools, like the already mentioned EDC connectors, or the level of usability that we achieve with DATAMITE’s catalogue. Likewise, properly creating data products remains challenging if considered beyond static datasets, i.e. data services, for which quality estimations must be provided.
DATAMITE Framework Architecture