Periodic Reporting for period 1 - PANDORA (A Comprehensive Framework enabling the Delivery of Trustworthy Datasets for Efficient AIoT Operation)
Período documentado: 2024-04-01 hasta 2025-09-30
The objectives of the project are:
1. Advance research excellence in the development of resilient, transparent, and human-centred AI approaches towards optimised and autonomous data processing and use.
2. Provide novel methods, mechanisms, and tools for the development of customisable, and trustworthy datasets for model-based AI developments.
3. Support the development and the continual autonomous operations of robust and energy efficient “data in AI” pipelines across the computing continuum.
4. Provide a cross-sector and multi-variant smart data space to realise the PANDORA framework and validate the data-enabled trustworthy AI mechanisms in real life scenarios.
5. Foster synergetic approaches in the EU industrial and scientific research communities and promote international collaboration on efficient and trustworthy AI approaches
6. Enhance the EU multidisciplinary competencies in the fields of industrial AI, data and robotics and embrace open innovation.
- Scientific and Technical advances across trustworthy AI components. PANDORA delivered major progress in federated representation learning, uncertainty quantification, semi-automated data labelling, synthetic data generation, dimensionality reduction, and energy-efficient continual learning. Several KPIs under Objectives 1–3 were already achieved or exceeded (e.g. +70% federated detection accuracy, +35k× energy gains, +50% improvement in annotation accuracy).
- Deployment of core aArchitecture, data backbone, and first operational testbeds. The project defined the 4+1 PANDORA architecture, delivered the GDPR-compliant Data Collection Mechanism, released initial AaaS/CaaS/IM platform components, and instantiated PANDORA testbeds enabling end-to-end data pipelines for training, inference, monitoring, and orchestration.
-Consolidation of cross-sector pilots and validation framework. All use-case scenarios, requirements, and pipeline blueprints were defined, and GDPR-compliant data flows established.
Work highlights per WP:
WP2 delivered the complete architectural foundation of the PANDORA framework and produced the requirements analysis framework guiding all pilots. Use-case scenarios, business/data/user requirements and pipeline blueprints were aligned across domains.
WP3 advanced key scientific components: synthetic data generation (tSDG/vSDG), uncertainty quantification (QU-MAD), explainability (GENEO & causal models), automated labelling/completion (NNTL-MVI), and dimensionality reduction/fusion (DRFEC). Multiple KPIs were successfully achieved, with validation across industrial datasets.
WP4 developed core methods enabling resilient and energy-efficient AI pipelines, including continual, domain-informed and explainable AI , continual inference acceleration, federated representation learning, and adaptive distribution of inference tasks. Models were validated on real datasets with significant performance gains.
WP5 delivered key platform components, i.e. AaaS, CaaS, Intent Manager, authentication and UI. Initial integration of training, inference, monitoring, and orchestration pipelines was achieved. Testbeds were instantiated, enabling operational deployment of PANDORA components in industrial settings.
WP6 produced complete pipeline instantiation documents for all pilots, with validation procedures and KPI-driven evaluation variables. Preparatory steps for on-site experimentation were initiated across all use cases.
- Breakthroughs in federated and continual learning efficiency, introducing advanced federated representational learning methods achieving substantially higher anomaly detection performance compared with existing FL baselines. Novel MAML-based continual learning techniques reduced training time and energy consumption, enabling autonomous and low-cost model updates across edge–cloud systems.
- Uncertainty Quantification and Explainability methods for Industrial AI. The QU-MAD component provided significant improvements in probabilistic accuracy. The GENEO-based causal reasoning and the ChronoEpilogi algorithm delivered domain-informed explainability advances, achieving high conciseness, fidelity, and stability in explanations.
- Automated data labelling, completion and dimensionality reduction. PANDORA developed a novel neural transfer-learning framework achieving improvements in annotation accuracy and reduction in manual interventions for missing-value reconstruction. The DRFEC component achieved >50% dimensionality reduction while preserving predictive performance, outperforming mainstream DR techniques.
- Synthetic data generation for visual and time-series data. Components that enabled generation of realistic, customizable synthetic datasets that support training robustness, data augmentation, and reduced dependency on costly real data.
- Real-Time inference acceleration and adaptive edge–cloud optimization via novel continual inference models (Continual Nyströmformers, DeepCoT) that reduced inference and execution times exceeding current state-of-the-art streaming inference systems. Furthermore dynamic placement mechanisms reduce inference latency and energy consumption through cluster-aware model distribution.
Potential Impacts:
- Higher efficiency: Reduced data processing costs, fewer human interventions, and improved model performance directly support predictive maintenance, quality inspection, and safety-critical operations.
- Trustworthy, transparent AI adoption: Explainability and UQ capacities build operator trust and regulatory alignment.
- Reduced dependency on real data: Synthetic data and automated labelling lower the barriers for AI deployment in data-sparse sectors.
- Energy-efficient AI: Reductions in computational cost support sustainability and lower environmental impact.