Periodic Reporting for period 2 - STELAR (Spatio-TEmporal Linked data tools for the AgRi-food data space)
Periodo di rendicontazione: 2024-03-01 al 2025-08-31
The STELAR project designs, develops, evaluates, and showcases an innovative Knowledge Lake Management System (KLMS) to support and facilitate a holistic approach for FAIR (Findable, Accessible, Interoperable, Reusable) and AI-ready (high-quality, reliably labelled) data. The STELAR KLMS allows to (semi-)automatically turn a raw data lake into a knowledge lake by: (a) enhancing the data lake with a knowledge layer; and (b) developing and integrating a set of data management tools and workflows. The KLMS combines both human-in-the-loop and automatic approaches, leveraging background knowledge of domain experts, while minimizing their involvement. An organization, such as a data-intensive SME or the operator of a data marketplace, can use the STELAR KLMS to increase the readiness of its data assets for use in AI applications and for being shared and exchanged within a common data space.
The STELAR KLMS is pilot-tested in diverse, real-world use cases in the agrifood domain, one of the domains of strategic societal and economic importance identified in the European Strategy for Data. The food supply chain covers all stages from production to transport, distribution, marketing, and consumption. Thus, the agrifood domain involves various stakeholders, including producers, advisors, machinery manufacturers, processing actors, inspectors, certification authorities, insurance companies, governmental agencies, all of which have an interest or even legal obligation to exchange and share data. The project conducts three pilots, covering different stages of the food chain, involving and combining different types of data, and addressing different stakeholders and user needs: (1) Risk prevention in food supply lines, integrating worldwide food safety related data sources; (2) Early crop growth predictions, integrating current and historical satellite, hyperspectral, meteorological and synthetic data; (3) Timely precision farming interventions, integrating different types of sensor data from the field.
To improve data linking, we have developed a library that offers several algorithms for schema and entity matching. We have made several improvements in terms of efficiency and scalability, and we have conducted experimental analyses comparing several pre-trained language models on various benchmark datasets. We have worked on fusing data from multiple satellite sources with different characteristics. We have also developed an efficient and customizable library for discovering complex correlations.
To increase the AI-readiness of data, we have made progress on several domain-specific tasks. We have addressed the problem of food entity extraction from unstructured data from several sources, improving the results through bias detection and data augmentation. With respect to satellite images, we have designed and examined methods for field segmentation and for crop classification.
In terms of evaluation, we have specified and evaluated several use cases, to assess the performance of the KLMS Platform and Toolkit across various criteria, including efficiency, effectiveness and usability.