Periodic Reporting for period 2 - MobiSpaces (New data spaces for green mobility)
Reporting period: 2024-03-01 to 2025-08-31
• A data governance platform for mobility data, comprising data-related operations (semantic modelling, data transformation, data cleaning, data provenance) as well as security-related operations both for design-time security and for real-time privacy preservation. Also, a mechanism for automatic deployment of data-related pipelines in multi-cluster settings that includes edge clusters is also provided.
• A set of declarative querying systems that support the concept of “SQL on everything”, also tailored for mobility data. This includes systems such as MobilityDB and PyMEOS for spatio-temporal data, NoDA for querying NoSQL stores, LeanXcale for high-throughput transaction processing.
• An end-to-end framework for the design and deployment of AI workflows that includes an AI workflow builder that is able to assign such workflows to a smart resource allocator that tries to find the optimal deployment in a given cluster setting, taking into account availability, trustworthiness and energy consumption metrics.
• An edge analytics suite that contains specialized algorithms for mobility analytics, including edge analytics, federated learning algorithms, explainable AI for spatio-temporal data, and privacy-aware visual analytics.
• Integration of the above technologies with IDS-compliant data connectors, to showcase the applicability of MobiSpaces technology in the context of European data spaces.
• Evaluation and validation in five (5) challenging, commercial use-cases, covering two (2) mobility domains: urban and maritime.
1. A data pipeline for making mobility data FAIR, exploiting semantic representation, metadata annotation, data interlinking and data provenance.
2. The Security Risk Modeler, a tool for monitoring security at design-time.
3. Open-source libraries for mobility data processing, suitable to run on edge environments, such as PyMEOS, and streaming mobility data.
4. A novel format for cloud-based storage, called TrajParquet, that extends Apache Parquet to become applicable for trajectories.
5. An AutoML approach that enables automatic selection of the best clustering algorithm for a given tabular dataset.
6. A highly efficient data warehousing solution for storing and querying AIS datasets that span several years, thus allowing easy and efficient data analytics on historical vessel positional data.
7. An extremely efficient suite of edge analytics algorithms that outperform existing solutions by several orders of magnitude due to optimized implementation.
8. Techniques for adapting explainable AI methods for the domain of spatial and spatio-temporal data.
9. Several federated learning algorithms for mobility-related applications, including data cleaning, anomaly detection, future location prediction.