Skip to main content
European Commission logo print header

Anomaly Detection combining time series and unstructured data using AI/ML algorithms in a distributed cloud/edge device scenario

Periodic Reporting for period 1 - SmartAD (Anomaly Detection combining time series and unstructured data using AI/ML algorithms in a distributed cloud/edge device scenario)

Reporting period: 2020-10-01 to 2021-09-30

What is the problem/issue being addressed?

Automated/smart, accurate and secure anomaly detection was identified as top priority to disrupt the IoT device management tools. Intelligent algorithms can be used for detection of suspicious behaviour using advanced anomaly detection, fingerprinting and the awareness of configuration changes that the device should adhere to.

Why is it important for society?

We expect our product to positively impact industry workflows by having a critical role on mitigating the lack of efficient and secure tools for proper management of large numbers of IoT devices, dispersed through multiple networks and geographies. Cybersecurity is a key necessity to allow secure societies. In today’s environment attacks against critical infrastructure can cause huge damage with high ecological and economical costs. The right security and administration competence is not always present in industry 4.0 companies here qbee helps companies to harden devices and fill a possible competence gap.

What are the overall objectives?

Develop ML algorithms, building upon published, state-of-the-art models as well as preparing unstructured and time series data to define baseline “signatures” and alert thresholds for various classes of devices and use cases;

Establish and share common patterns for detection of anomalies and threats on IoT devices between devices and users;

Test and classify the effectiveness of the new algorithms for both edge and cloud computing and select the best ones to integrate in the production platform.
As in any machine learning project the first step was to investigate and map the available data, identify analysis possibilities as well as define which data was lacking. Based on this evaluation additional data was collected both on the device and in the backend.
A key finding and recommendation at this point in the project was that the existing database technology should be exchanged. This was a prerequisite for being able to do more advanced analysis with more advanced queries. Through feature extraction the data was shaped and prepared for different algorithms.
As a next step different ML algorithms were examined and compared in terms of performance. Classification algorithms perform very well but often need a large set of labeled data that might not always be available. In addition, these can be very compute intensive. Therefore, also lighter algorithms have been tested and implemented. The boundary between what is statistics and what is machine learning can be very thin and a key learning from the project was to always consider statistics and mathematical models before using heavy machine learning.
The final outcome of the project was that turned into a much improved embedded Linux device management platform with additional smart analysis capabilities.
No progress beyond the state of art has been achieved and this is as expected for a 12 month project that also connects closely with production systems.
Connection quality metrics
Connection quality summary