Skip to main content
Go to the home page of the European Commission (opens in new window)
English en
CORDIS - EU research results
CORDIS

Enabling Data Enrichment Pipelines for AI-driven Business Products and Services

Periodic Reporting for period 2 - enRichMyData (Enabling Data Enrichment Pipelines for AI-driven Business Products and Services)

Reporting period: 2023-10-01 to 2025-09-30

High-quality, rich and meaningful data are crucial to the successful implementation of Artificial Intelligence (AI) and Big Data Analytics (BDA) solutions. The process of delivering required data to feed into AI and BDA models is costly, difficult, and often limited in terms of data and skill availability. It is well-known that up to 80% of the effort spent in AI and BDA projects is dedicated to ensuring data is fit for purpose. Activities are required to discover, understand, select, clean, transform, integrate data from a variety of sources in such a way that data can be fed into the modeling phase. Such activities result in enriched data, that eventually improve the quality of downstream BDA and AI applications. The data enrichment process is implemented by specifying, deploying, and executing data enrichment pipelines over data that can be structured, semi-structured and unstructured, in large amounts, and from static or streaming sources. While techniques exist to cover different enrichment operations such as data cleaning, linking, feature extraction, classification and semantic annotation, etc., the lack of comprehensive approaches and established tools dedicated to data enrichment makes the definition, implementation, and operation of enrichment pipelines difficult for too many organizations willing to improve their BDA and AI applications.

The objective of enRichMyData has been to develop an open software toolbox – the enRichMyData toolbox – comprising practical, robust and scalable components to support organizations in enriching their data with reference data they may have limited knowledge of, as well as supporting data providers in making their data reusable and available in data enrichment processes. The aim of the toolbox was to lower the technological entry barriers by providing support for the definition of highly scalable and replicable data enrichment pipelines through a set of tools and infrastructure services related to capabilities needed during the lifecycle of enrichment pipelines. The toolbox has made the data enrichment process accessible to a wider set of stakeholders by reducing the level of expertise required and enhancing the level of tool support.
For Period 2, the actions towards meeting project objectives have been mainly centred on the consolidation, extension and validation of the enRichMyData toolbox and its constituent tools, with a focus on their mature integration and scientific excellence.
The work involved the refinement and implementation of advanced enrichment functionalities across a total of 22 tools and services, of which 15 are released as open source, 6 are provided under commercial licences and 1 is proprietary but free-to-use, including the development of three new tools during this reporting period. This was followed by systematic validation through the execution of the six business cases in near-operational environments.
This process resulted in the delivery of stable and interoperable tool components, supported by empirical evaluation, benchmark testing and the release of high-quality datasets to ensure reproducibility and scientific reliability.
The outcomes of this work are documented in Deliverables D2.2 D3.2 and D4.2 which showcase the finalised technical architecture and validated performance of the enRichMyData toolbox.
The enRichMyData toolbox was released as a consolidated and mature framework delivering interoperable, scientifically validated components for scalable data enrichment across diverse domains. The final results demonstrate the capability of the toolbox to support end-to-end enrichment workflows that combine semantic technologies, AI-driven analysis and orchestration mechanisms and can be directly adopted in real-world operational settings.
The project outcomes indicated strong potential for long-term impact in domains such as procurement intelligence, industrial process optimisation, healthcare maintenance and innovation monitoring, where enhanced data quality and semantic interoperability directly support improved decision-making and automation. Additional support in accessing international markets, structured IPR management and the integration of toolbox components into industrial platforms and data spaces will further reinforce long-term sustainability and adoption.
Project logo
My booklet 0 0