Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

EXtreme Analytics for MINing Data spaces

Periodic Reporting for period 1 - EXA4MIND (EXtreme Analytics for MINing Data spaces)

Reporting period: 2023-01-01 to 2024-06-30

The EXA4MIND Horizon Europe project builds an Extreme Data analytics platform combining best-of-breed data-management systems with supercomputing infrastructure and European data ecosystems. It is driven by nine partners: two supercomputing centres (IT4I@VSB, CZ; BADW-LRZ, DE), six partners providing, developing and testing pilot application cases for co-design (IT4I@VSB, VALEO, VALEO.AI CVUT, TERRAVIEW, ALTRNATIV) and two partners specialised in dissemination and technology transfer (AUSTRALO, EURAXENT).

Taking advantage of the latest trends in Artificial Intelligence (AI) and data analytics, we provide a portable, modular and easy-to-deploy“Extreme Data Database” (EDD), consisting of object-storage and database backends with a unified interface, the “Advanced Query and Indexing System” (AQIS). Leveraging Large Language Models (LLM), the AQIS enables highly efficient natural language or database language queries across different data backends, each optimised for its specific purpose, using caching, indexing and streaming techniques are employed for efficient information retrieval. It will address concrete object storage technologies such as S3 or iRODS and several types of SQL databases (PostgreSQL) and NoSQL databases such as OpenSearch, ElasticSearch, Mongo vector and graph DBs. AQIS will also enable designing and running data access and data analytics pipelines, by using latest workflow and streaming data technologies (Airflow, Kafka, Flink), involving data access, querying and analysis. In this way, we enable the use of Europe's public supercomputers for data analytics at scale. Besides automatising data-driven workflows with appropriate orchestration, EXA4MIND has a strong focus on connecting to European data-sharing ecosystems, first and foremost the EOSC, EUDAT and European Data Spaces. FAIR research data handling is being implemented and methods for trustworthy treatment and controlled sharing of enterprise data are explored.

The EXA4MIND team co-designs the platform with four demanding application cases in the fields of molecular dynamics, autonomous driving, smart agriculture/viticulture and health/social big data. The co-design process with a variety of research partners from academia, SMEs and industry aims to develop a generic architecture tested in real-life scenarios. In this endeavour, the application cases are expected to make significant leap forward in their research and development, boosted by EXA4MIND and the EDD and largely based on AI and machine learning. A large molecular dynamics data platform will bring together simulations and experimental observations to make simulations more realistic. Advanced driver assistance systems will have a better basis in extreme amounts of scene data automatically annotated. A smart viticulture system will be able to predict soil moisture content based on satellite imagery and weather data, and a data mining interface for European public health data will allow show its potential for flexible decision making in public health. All this is accompanied by a continuous Ethics assessment and by research on fair AI.
We have successfully designed the architecture of our novel EXA4MIND Extreme Data analytics platform to be leveraged by the application cases at our supercomputing centres. To this end, an extensive requirements and data-flow pattern analysis has driven an intensive co-design process. The technical implementation of the platform which started at the same time, has yielded a data and workflow management toolbox, a preprocessing toolbox and the first version of the AQIS, already involving multiple data backends and accepting queries in natural language by leveraging LLM. This has enabled all application cases to run data-driven workflows on the infrastructures of the participating supercomputing sites. First benchmarks have been executed to determine the performance of data systems and data-flow paths in critical situations. These have concentrated on optimising data organisation (comparing different database and storage systems) and on optimising data-transfer and streaming performance (comparing different protocols and mechanisms). The application cases have all developed the first version of their analytics toolchains and user interfaces. Besides Extreme Data analytics, the devised workflows facilitate automated large-scale data transfers, and data caching. Besides our strong focus on uptake of the EXA4MIND platform, the application cases have achieved high levels of success in their areas of research and development.
The first version of the EXA4MIND Platform is a unique system for leveraging a combination of specialised data backends via unified query and data-staging interfaces at supercomputing centres in automatised data-analytics workflows. Its AQIS leverages the latest LLM developments for natural-language queries on Extreme Data (besides usual database-language queries) across different backends. The platform facilitated system usage for academia, SMEs and industry. In the Molecular Dynamics (MD) application case, a novel and FAIR database collecting MD simulation data and experimental observations with appropriate metadata has been developed (public Integrated DAtabase - IDA; Advanced Data Mining System for the Systematic Improvement of MD Simulations - ADAMS4SIMS). It will serve simulation-technique improvements based on data mining, which can facilitate discoveries in molecular biology and medicine. The Autonomous Driving application case has made leaps forward in image segmenting and object recognition based on AI methods. It has collected huge amounts of real-world data from driving vehicles and devised novel methods, evaluating – e.g. – a combination of image and LiDAR data for metadata annotation of objects in driving scenes, making Advanced Driving Assistance Systems more reliable. The Smart Viticulture application case is succeeding in developing its soil-moisture prediction system. As one important component of a smart-agriculture system, this shall reduce the amounts of wasted water and increase agricultural yields by optimum water supply. Finally, EXA4MIND’s Public Health application case determines a versatile data-analytics interface allowing for the mining of large amounts of health- and society related data by dynamically-defined query pipelines. Such a system helps public authorities to improve healthcare. All this demonstrates the flexibility of the EXA4MIND Platform as a backend.
exa4mind-logo-main.png
My booklet 0 0