Skip to main content
Aller à la page d’accueil de la Commission européenne (s’ouvre dans une nouvelle fenêtre)
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS

SElf-Adaptive and Automated LEARNing Framework for Smart Sensors

Periodic Reporting for period 1 - SEA2Learn (SElf-Adaptive and Automated LEARNing Framework for Smart Sensors)

Période du rapport: 2022-09-01 au 2024-08-31

The SEA2Learn research project has investigated lightweight Continual Learning methods for post-deployment adaptation of sensor processing algorithms, i.e. Deep Learning models, embedded inside low-power embedded sensor nodes. The overall aim was to define new methodologies for extreme-edge devices, such as battery-powered smart sensors. These methodologies seek to overcome the current "train-once-deploy-everywhere" paradigm where the local sensor processing algorithms cannot adapt depending on the user and the external environment. The new learning approaches have been examined from both a system and algorithm perspective, particularly in the context of a voice command recognition use case. We focused on tasks with low availability of human-labeled data, which is a common real-world application scenario.

To reach the ambitious goal, the SEA2Learn project has pursued the following research Objectives (Obj):
Obj1. Design an adaptive smart sensor platform, based on Open Source HW/SW components for real-time adaptation.
Obj2. Define efficient mechanisms to continually learn from use-case-specific streams of sensor data under the resource constraints of the tiny devices.
Obj3. Demonstrate a fully automated learning process by leveraging unsupervised learning using new data captured with single-sensor and multi-sensor setups.
The activity to achieve the project’s objectives Obj1, Obj2 and Obj3 was conducted within three Work Packages (WP), respectively, WP1, WP2 and WP3.

The design of the adaptative smart was conducted in WP1 and was based on a heterogeneous and ultra-low power microcontroller platform, GAP9, fabricated by Greenwaves Technologies (the Secondment Institute). The processing unit features a computing cluster of 9 general-purpose RISC-V cores and a convolution accelerator NE16, which were originally designed and made available by the Open-Source Hardware PULP project. Starting from this design, Obj1 was achieved by demonstrating the first ultra-low power node that can learn new keywords after the DNN was already deployed on-device. To achieve the goal, we initially studied the deployment of common KWS on the target microcontroller device, the multi-core GAP9 chip. We analyzed the effect of the quantization (half-precision floating point vs. 8-bit integers) and the type of processing elements (multi-core CPUs vs. convolution accelerator) with respect to energy and latency metrics. The task was conducted using GAP9 development board and the tools provided by GreenWaves Technologies. Then, a second prototype was studied and assembled by combining the processing unit with a microphone sensor and a Bluetooth Low-Energy (BLE) board. This activity was partially carried out during the secondment, at Greenwaves Technologies (GWT). Thanks to the support of the application team at GWT, an energy-optimized software state machine was developed to concurrently handle the sensing, processing and communication tasks. Finally, a set of 4 sensors was used to collect a dataset of voice commands from a group of 20 volunteers.

The study of continual learning mechanisms (also identified as incremental learning) has been conducted in WP2 with respect to a use case in the audio domain. We referred to a few-shot continual learning scenario, where the new task is defined by feeding the learning engine with a few labeled data per class of the new keywords to be recognized. To achieve Obj2, we identified a few-shot continual learning methodology suitable for resource-constrained devices, where the DNN already deployed on an ultra-low power device, e.g. a microcontroller, can learn new classes. Initially, we focused our study on a few-shot continual learning scenario. After reviewing the literature in this domain, we build a Pytorch framework for evaluating multiple training techniques for DNN-based methods. Every method was applied to our audio-based application scenario (keyword spotting recognition). We analyzed multiple loss functions and we assessed the effectiveness of the obtained models in an open-set test environment. Concurrently, we analyzed multiple variants of the DNN architectures, e.g. normalization of the output feature, and size of the embedding vectors, along with different classifiers. The best configurations were then compared against other state-of-the-art techniques in this domain. Afterwards, we extended the evaluation of the most effective continual learning technique by taking a large set of DNNs with variable model capacity. The methods were benchmarked while varying the number of given examples.

Finally, in WP3 we studied a method that uses unsupervised data to improve the recognition accuracy of the algorithms identified in WP2 and then we applied this approach to a multi-sensor dataset, obtained from WP1. More indetail, we experimented with our Continual Learning method in a more challenging scenario where, after few-shot initialization, the DNN was fed with new unsupervised data. The benchmark suite now included more datasets, e.g. HeySnips and HeySnapdragon. We adopted a pseudo-labeling strategy to assign labels to the new data, and use them later for fine-tuning the model. After defining the method, we ran experiments to determine the parametrization of the algorithm and to assess the performance in various scenarios. At the end, the method was tested in a multi-sensor scenario. Differently from single sensor setups, we defined a fusion rule to aggregate the scores from the multiple sensors and assign “network-level” pseudo-labels. We ran experiments on the collected dataset (from WP1), which confirmed to observation on the single-sensor setup. These results justify the achievement of Obj3. Overall, we introduced the first technique to self-learn a DNN model after on-device deployment, in a multi-sensor context, using new unlabeled data.
- We defined a methodology for on-device customization using few-examples in the context of a keyword-spotting application. The algorithm is composed of a DNN-based encoder, trained on a generic dataset using the triplet loss, and a classifier that can be initialized on-device by recording a few utterances of the new custom keywords. This method achieved the highest accuracies for few-shot Keyword Spotting in open-set test conditions when evaluated on lightweight DNN models for resource-constrained devices, outperforming other strategies that utilized various learning loss functions and model variants.

This technological innovation brings a new perspective for smart sensor devices. At present, the sensor intelligence is tuned a-priori, without any knowledge of the user, task, or the environment. Customizing these sensors typically requires collecting personal data and sharing it with third parties to develop a new algorithm, which is then sent back to the sensor. This process does not however protect the privacy of personal data and creates a dependency on remote services for updates to the model. Our solution aims to enable new privacy-preserving sensing technology and products that can be customized in the field through user interaction. To support this initiative, we conducted knowledge transfer activities with a small to medium-sized enterprise (SME) and developed a demonstrator for the technology. Additionally, we have made the code publicly available to empower other SMEs to incorporate this technology into their future products.


- We developed a novel methodology to enhance the recognition accuracy of smart audio sensors deployed in the field by utilizing new pseudo-labeled data and on-device learning. After evaluating the effectiveness of the incremental learning task and the costs associated with labeling new data and conducting on-device training, we concluded that this method could be effectively implemented on an ultra-low power sensor node. This represents a breakthrough innovation in the field, as it demonstrates the feasibility of learning new concepts directly on-device after deployment, not demanding new labeled data. In contrast, typical approaches rely on frozen deep neural network (DNN) algorithms, which are trained on collected data before deployment. Due to the static nature of these DNNs and the fact that the training data often do not accurately represent the user-defined tasks, the final solutions tend to underperform in real-world scenarios and lack the ability to adapt to new tasks and environments. Our method provides a viable solution to these challenges by enabling effective on-device learning based on new unsupervised data.

Following up on the first innovation, this method enhances the personalization of new sensor products by protecting personal data privacy while utilizing the information carried by low-cost unsupervised data. This approach also lowers the barriers between designers and end-users by allowing users to act as "designers" of their own technology through private interactions with the devices. This additionally helps counteract the biases that often affect AI technology. In this direction, we have established guidelines and initial design principles, which have been communicated to industry stakeholders and researchers. However, further research is needed to bring this technology to a higher TRL. Specifically, we need to develop more robust learning algorithms, as we have identified potential pitfalls in the current approach that may lead to unstable solutions. Additionally, new hardware and software solutions are required for on-device learning, as current systems rely on power-hungry GPU architecture. In this context, distributing the learning tasks across edge device nodes can also help reduce the energy consumption of present AI technologies, as demonstrated by our project.


- We introduced a wireless audio sensor network designed for collaborative incremental learning using the collected data. Each node in the network is equipped with a processing unit and a wireless interface that shares predicted labels for new data. These predictions are then aggregated to produce network-level labels. Our findings demonstrate that this multi-sensor configuration achieves higher accuracy compared to single-sensor setups by taking advantage of the spatial redundancy provided by the sensors.

This methodological innovation can impact the digitalization of industries, cities, and home environments. This novel spatial computing paradigm brings benefits for distributed sensor networks to build new personalized services, compared to traditional single-point services only connected to the network. Research in this direction is in its infancy, mainly because of the lack of an open dataset. Hence, to support the research community, we released the first-of-its-kind dataset of voice commands collected using our wireless audio sensor network.
Mon livret 0 0