Multivariate Analysis of Big Data in Software Defined Networks

Información del proyecto

MAD-SDN

Identificador del acuerdo de subvención: 893146

DOI

10.3030/893146

Proyecto cerrado

Fecha de la firma de la CE 25 Marzo 2020

Fecha de inicio 1 Marzo 2021

Fecha de finalización 28 Febrero 2023

Financiado con arreglo a

EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions

Coste total

€ 172 932,48

Aportación de la UE

€ 172 932,48

172 932,48

Coordinado por

UNIVERSIDAD DE GRANADA
Spain

Periodic Reporting for period 1 - MAD-SDN (Multivariate Analysis of Big Data in Software Defined Networks)

Período documentado: 2021-03-01 hasta 2023-02-28

The networks of the future will be self-configuring, self-maintaining, self-securing and self-monitoring networks – autonomous networks. Although many efforts have been made in this area, autonomous networks still need innovative methods that enable them to quickly adapt to rapidly changing conditions in the network environment. The MAD-SDN project focused on the two main problems: network traffic classification and anomaly detection in the network security domain. The project proposed approaches based on multivariate big data analysis methods. We have performed comprehensive research with the MBDA methodology and several multivariate methods like Principal Component Analysis (PCA) or Partial Least Squares (PLS). Research and analyses conducted in this area revealed important issues that affect the results of Machine Learning modelling, and that are most often overlooked in our research area. A major problem relates to the quality of the network datasets. High-quality datasets are the key to the high performance of ML models and their usefulness in real networks. Research results were presented at conferences and discussed with the community. However, our outcomes do not solve all the problems and this area requires further work.

The project began with training in the Multivariate Big Data Analysis (MBDA) framework, and training in multivariate data analysis and machine learning techniques continued throughout the project. Research related to the core of the project included: SDN data analysis, Principal Component Analysis (PCA)-MBDA, Partial Least Squares (PLS)-MBDA. The MBDA methodology consists of five steps: data parsing, data fusion, detection, diagnosis, and deparsing, and has been proposed as a root cause solution for anomaly detection and traffic classification problems, as opposed to black box models such as deep learning. In the detection phase we have used two techniques: PCA and PLS. We experimented with the publicly available datasets. Regarding to the PCA-based MBDA methodology, we developed a paper which is under revision. We are still working on PLS-based MBDA. We performed initial experiments, but results are too preliminary at this point.

We have also worked on other multivariate techniques based on linear matrix factorization (apart from PCA, PLS, we are using the ANOVA Simultaneous Component Analysis (ASCA), Parallel Factor Analysis (PARAFAC) and sparse methods, that can be used in network analytics process. We are preparing both a conference paper and a journal paper. Moreover, we have worked on the extension of MBDA to Federated Learning.

During the work (actually, soon after the beginning of the project), we noticed that the machine learning results strongly depend on the quality of the dataset, and generally experienced that there is a shortage of high-quality datasets in the networking area. On the other hand, building a real, high quality dataset in the laboratory environment is very challenging and time consuming task. Moreover, evaluating the quality of a dataset is also a common problem. Although there are several methods in the data quality assessment field, none is completely functional, and this problem is generally overlooked in the area of network research. During the project, we devoted much effort to this problem. Our goal was to find a way to help evaluate the quality of a dataset holistically and in an automatic way, so that it could be used in an autonomous network to assess the dataset quality of the automatic measurements with telemetry. We proposed the PerQoDA methodology based on permutation testing. This method can test the strength of relationships between observations and labels. If this relationship is weak, we cannot expect the ML model to work perfectly. We obtained significant results in this field and prepared two papers that were presented at two high-quality conferences as well as during a research visit to Paris.

During the project, we noticed the big gap in the networking research area: dataset quality problem. By working on this and disseminating information about the problem and the results obtained, we can help create ML models capable of effectively detecting problems in networks. This cannot be achieved if we do not provide high-quality datasets and methods for their evaluation.

logo.png

Periodic Reporting for period 1 - MAD-SDN (Multivariate Analysis of Big Data in Software Defined Networks)

Descargar Descargar el contenido de la página