Periodic Reporting for period 1 - EXA4MIND (EXtreme Analytics for MINing Data spaces)
Reporting period: 2023-01-01 to 2024-06-30
Taking advantage of the latest trends in Artificial Intelligence (AI) and data analytics, we provide a portable, modular and easy-to-deploy“Extreme Data Database” (EDD), consisting of object-storage and database backends with a unified interface, the “Advanced Query and Indexing System” (AQIS). Leveraging Large Language Models (LLM), the AQIS enables highly efficient natural language or database language queries across different data backends, each optimised for its specific purpose, using caching, indexing and streaming techniques are employed for efficient information retrieval. It will address concrete object storage technologies such as S3 or iRODS and several types of SQL databases (PostgreSQL) and NoSQL databases such as OpenSearch, ElasticSearch, Mongo vector and graph DBs. AQIS will also enable designing and running data access and data analytics pipelines, by using latest workflow and streaming data technologies (Airflow, Kafka, Flink), involving data access, querying and analysis. In this way, we enable the use of Europe's public supercomputers for data analytics at scale. Besides automatising data-driven workflows with appropriate orchestration, EXA4MIND has a strong focus on connecting to European data-sharing ecosystems, first and foremost the EOSC, EUDAT and European Data Spaces. FAIR research data handling is being implemented and methods for trustworthy treatment and controlled sharing of enterprise data are explored.
The EXA4MIND team co-designs the platform with four demanding application cases in the fields of molecular dynamics, autonomous driving, smart agriculture/viticulture and health/social big data. The co-design process with a variety of research partners from academia, SMEs and industry aims to develop a generic architecture tested in real-life scenarios. In this endeavour, the application cases are expected to make significant leap forward in their research and development, boosted by EXA4MIND and the EDD and largely based on AI and machine learning. A large molecular dynamics data platform will bring together simulations and experimental observations to make simulations more realistic. Advanced driver assistance systems will have a better basis in extreme amounts of scene data automatically annotated. A smart viticulture system will be able to predict soil moisture content based on satellite imagery and weather data, and a data mining interface for European public health data will allow show its potential for flexible decision making in public health. All this is accompanied by a continuous Ethics assessment and by research on fair AI.