European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Anomaly detection in distributed networks

Final Report Summary - DAD (Anomaly detection in distributed networks)

The main research theme in this project was distributed and non-parametric anomaly (novelty) detection. We considered this problem in an ambitious and practical setting, namely under distributed, constantly changing, large scale and complex network conditions. We studied the fundamental performance bounds, analysed the data and real world measurements using spatio-temporal graphical models and derived state of the art message passing methods. The results of this research will fundamentally advance the state of the art in the theory and practice of distributed network design and analysis of anomaly detection. The mathematical tools which were developed will significantly impact other distributed estimation methods as well.

The main result is the derivation of a distributed version of principal component analysis (PCA). This analysis is a classical non parametric dimensionality reduction method that has been proved successful for anomaly detection in high dimensional settings. However, standard PCA involves covariance estimation and eigenvalue decompositions which are difficult to implement in distributed networks. Therefore, we proposed to combine PCA with Gaussian graphical models. These models allow for decentralised processing through the use of prior conditional independence.

Using this structure and tools from graph theory and convex optimisation, we implemented a distributed PCA method based on message passing. We then identified a prior conditional independence graph in the Abilene network based on its topology. Together, these results allowed for distributed anomaly detection in the network. We verified the performance of our methodology in comparison to existing centralised methods, and demonstrated its advantage using a real world dataset.

The research on anomaly detection in high dimensional networks has led to many theoretical questions on covariance estimation which is a vital part of our methodology. This is a fundamental and very difficult problem which has recently received considerable attention.

In our search for better estimators, we examined classical centralised techniques and improved them. As originally suggested in the project's proposal, we exploited biased estimation techniques and derived estimators which are provably better than state of the art approaches. This work has been done with close collaboration between the outgoing and incoming research groups.

Next, we generalised the setting to a distributed setting based on graphical models. Using the theories of minimum variance unbiased estimation and Stein's unbiased risk estimation, we derived and analysed provably better estimators which can considerably improve the performance in large dimension small sample settings.

Altogether, the first period achieved its main training, theoretical, application and collaboration goals. The researcher was introduced to a new mathematical field where he did not have any experience before. A unified framework for distributed anomaly detection in networks was proposed and novel algorithms were derived and analysed. The results were presented in prestige conferences and were accepted for publication in the leading signal processing journal. The research has been performed in joint work between the two hosts which have no previous record of collaboration, and the two research groups will continue to work closely in the future.

The results of our research are an important contribution to the state of the art in distributed machine learning both theoretically and algorithmically. Our distributed anomaly detection method can be applied to many practical networks, ranging from surveillance systems to complex biological networks, which adhere to internal conditional independence structures. On the other hand, our identification of a statistical graphical model within the Abilene network can be exploited in the derivation of other distributed processing and monitoring methods in similar computer networks.