The first phase of the project laid the foundations for the study and maturation of new techniques for developing and porting neural networks to embedded hardware. The first stable version of the AIDGE framework was released in April 2024. Its core module manages fundamental operations, including the manipulation of graphs and the optimization of models. AIDGE’s version history reflects a progression to advanced capabilities. Early releases focused on network inference, while subsequent versions introduced training and quantization features, improved Python and C interface, and expanded support for hardware targets. The latest release, version 0.8.0 introduces new functionalities such as Spiking Neural Networks, Post-Training Quantization, Quantization-Aware Training and ONNX simplification.
As an open-source project hosted by the Eclipse Foundation, AIDGE fosters a collaborative environment benefiting developers and researchers in the embedded AI community and places Europe at the forefront of this competitive market.
The applicative part of the project is structured around seven use-cases in several fields such as Autonomous transport, Satellite observation, Healthcare and Smart building. Specifications have been drawn up, and generic and specific KPIs have been defined for all use cases. Although Work Package 6 (Implementation and demonstrations) began at the end of Period 1, the Use Case implementations will be finalized in Work Period 3. This is because the necessary developments essential for integrating its results, will only be completed by that stage. Period 2 has seen the development of building blocks for the Use Cases and preparation of the implementation of the results from the other work packages.
Quantization- and hardware-aware training methods have been investigated, for object detectors and transformer-based networks. Low-bit post-training quantization methods are also studied and have been tested on keyword retrieval and gesture recognition applications. Quantization to 4 bits showed very little performance loss (2%).
New teacher-student distillation methods, based on metaheuristics, gradient trajectory matching and dataset compression have been studied, which led to up to 95% model size reduction with maintained or improved performances, enabling smaller networks to be embedded on constrained hardware.
Pruning and tensor decomposition methods are also studied, to reduce the size and simplify the structure of a large network. Results on convolutional networks showed a significant reduction in model size with minimal accuracy loss. Memory savings were achieved due to the removal of weights. Inference times were reduced, though improvements varied depending on the layer configuration and level of pruning.
With the fast development of AI, Spiking Neural Networks have shown a lot of success due to their distinct data processing technique. SNN eliminate the need to perform complex multiplications due to binary nature of spikes. Compression techniques have been researched to improve their efficiency and were tested on an obstacle-detection system that processes LiDAR data. Although knowledge distillation and pruning led to higher compression, the performances dropped by more than 10%. But quantization from 32-bit to 8-bit enabled lower size reduction (typically 75%) while keeping very close performances (less than 2% loss).