Periodic Reporting for period 1 - MAD-SDN (Multivariate Analysis of Big Data in Software Defined Networks)
Período documentado: 2021-03-01 hasta 2023-02-28
We have also worked on other multivariate techniques based on linear matrix factorization (apart from PCA, PLS, we are using the ANOVA Simultaneous Component Analysis (ASCA), Parallel Factor Analysis (PARAFAC) and sparse methods, that can be used in network analytics process. We are preparing both a conference paper and a journal paper. Moreover, we have worked on the extension of MBDA to Federated Learning.
During the work (actually, soon after the beginning of the project), we noticed that the machine learning results strongly depend on the quality of the dataset, and generally experienced that there is a shortage of high-quality datasets in the networking area. On the other hand, building a real, high quality dataset in the laboratory environment is very challenging and time consuming task. Moreover, evaluating the quality of a dataset is also a common problem. Although there are several methods in the data quality assessment field, none is completely functional, and this problem is generally overlooked in the area of network research. During the project, we devoted much effort to this problem. Our goal was to find a way to help evaluate the quality of a dataset holistically and in an automatic way, so that it could be used in an autonomous network to assess the dataset quality of the automatic measurements with telemetry. We proposed the PerQoDA methodology based on permutation testing. This method can test the strength of relationships between observations and labels. If this relationship is weak, we cannot expect the ML model to work perfectly. We obtained significant results in this field and prepared two papers that were presented at two high-quality conferences as well as during a research visit to Paris.