Periodic Reporting for period 2 - enRichMyData (Enabling Data Enrichment Pipelines for AI-driven Business Products and Services)
Reporting period: 2023-10-01 to 2025-09-30
The objective of enRichMyData has been to develop an open software toolbox – the enRichMyData toolbox – comprising practical, robust and scalable components to support organizations in enriching their data with reference data they may have limited knowledge of, as well as supporting data providers in making their data reusable and available in data enrichment processes. The aim of the toolbox was to lower the technological entry barriers by providing support for the definition of highly scalable and replicable data enrichment pipelines through a set of tools and infrastructure services related to capabilities needed during the lifecycle of enrichment pipelines. The toolbox has made the data enrichment process accessible to a wider set of stakeholders by reducing the level of expertise required and enhancing the level of tool support.
The work involved the refinement and implementation of advanced enrichment functionalities across a total of 22 tools and services, of which 15 are released as open source, 6 are provided under commercial licences and 1 is proprietary but free-to-use, including the development of three new tools during this reporting period. This was followed by systematic validation through the execution of the six business cases in near-operational environments.
This process resulted in the delivery of stable and interoperable tool components, supported by empirical evaluation, benchmark testing and the release of high-quality datasets to ensure reproducibility and scientific reliability.
The outcomes of this work are documented in Deliverables D2.2 D3.2 and D4.2 which showcase the finalised technical architecture and validated performance of the enRichMyData toolbox.
The project outcomes indicated strong potential for long-term impact in domains such as procurement intelligence, industrial process optimisation, healthcare maintenance and innovation monitoring, where enhanced data quality and semantic interoperability directly support improved decision-making and automation. Additional support in accessing international markets, structured IPR management and the integration of toolbox components into industrial platforms and data spaces will further reinforce long-term sustainability and adoption.