Periodic Reporting for period 1 - STELAR (Spatio-TEmporal Linked data tools for the AgRi-food data space)
Okres sprawozdawczy: 2022-09-01 do 2024-02-29
The STELAR project will design, develop, evaluate, and showcase an innovative Knowledge Lake Management System (KLMS) to support and facilitate a holistic approach for FAIR (Findable, Accessible, Interoperable, Reusable) and AI-ready (high-quality, reliably labelled) data. The STELAR KLMS will allow to (semi-)automatically turn a raw data lake into a knowledge lake by: (a) enhancing the data lake with a knowledge layer; and (b) developing and integrating a set of data management tools and workflows. The KLMS will combine both human-in-the-loop and automatic approaches, leveraging background knowledge of domain experts, while minimizing their involvement. An organization, such as a data-intensive SME or the operator of a data marketplace, will be able to use the STELAR KLMS to increase the readiness of its data assets for use in AI applications and for being shared and exchanged within a common data space.
The STELAR KLMS will be pilot-tested in diverse, real-world use cases in the agrifood data space, one of the data spaces of strategic societal and economic importance identified in the European Strategy for Data. The food supply chain covers all stages from production to transport, distribution, marketing, and consumption. Thus, the agrifood data space involves various stakeholders, including producers, advisors, machinery manufacturers, processing actors, inspectors, certification authorities, insurance companies, governmental agencies, all of which have an interest or even legal obligation to exchange and share data. We will conduct three pilots, covering different stages of the food chain, involving and combining different types of data, and addressing different stakeholders and user needs: (1) Risk prevention in food supply lines, integrating worldwide food safety related data sources; (2) Early crop growth predictions, integrating current and historical satellite, hyperspectral, meteorological and synthetic data; (3) Timely precision farming interventions, integrating different types of sensor data from the field.
To improve data linking, we have developed a library that offers several algorithms for schema and entity matching. We have made several improvements in terms of efficiency and scalability, and we have conducted experimental analyses comparing several pre-trained language models on various benchmark datasets. We have worked on fusing data from multiple satellite sources with different characteristics. We have also developed an efficient and customizable library for discovering complex correlations.
To increase the AI-readiness of data, we have made progress on several domain-specific tasks. We have addressed the problem of food entity extraction from unstructured data from several sources, improving the results through bias detection and data augmentation. With respect to satellite images, we have designed and examined methods for field segmentation and for crop classification.
Towards pilot testing, we have specified several use cases, designed a first version of the architecture for the KLMS Platform, and we have made progress on integrating and deploying the KLMS Platform and Tools.