European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS
Contenido archivado el 2024-05-28

Distributed Multi-way Analysis of Stream Data for Detection of Complex Attacks

Final Report Summary - DMASD4CA (Distributed multi-way analysis of stream data for detection of complex attacks)

Overview:
The focus of the project is mining complex data which may be in real-time, continuous, high dimensional, multimodal and may arrive (change) at different rate and volume. There are several characteristics of complex stream data that require further research. For example in many such applications analysing data at a single location is inefficient in terms of accuracy, space and computational complexity. Moreover queries may need to be processed in a near real-time manner. One shortcoming of the current state of art is that although there are well understood statistical and algorithmic techniques available for simple mining and summaries, computing more sophisticated summaries such as rank reduction in an on-line manner remain a difficult problem. We note that singular value decomposition (SVD) and similar approaches work on two-dimensional (2D) arrays and discover linear separations of data. However, if data has non-linear or multi-linear structure, then off-line algorithms such as SVD fail to capture it. M-way data analysis techniques (e.g. tensor decompositions) consider multiple modes of data simultaneously (e.g. a data cube as supposed to data matrix) to discover multi-linear structure. Similarly, support vector machines, kernel methods are used to analyse non-linear data. In this research, we consider the mining of complex data by collaborative (distributed) data collection, multiway analysis and knowledge extraction.

Timeliness, relevance and objectives of the project:
This research considers the fundamental questions that are keys to improving mining of complex stream data: how to collect data in a distributed way to optimise accuracy while minimising intrusion and performance degradation? How to decide on quantity, location of agents (programs that can collect data)? How to coordinate the communication and coordination of the data for mining and calculation? How to measure the accuracy and performance of such a collaborative system? Thus this project demonstrates how to:
(i) collect multidimensional data in a near real-time;
(ii) construct muItimodal models to fit to this data;
(iii) decompose such M-way models using tensor decompositions;
(iv) build a simulators that uses the statistics obtained in (i) to test the accuracy of the results in (iv).

Contribution, originality and innovation:
This research addresses several import issues for sampling and analysis of complex stream data. First, coordinated sampling must be done to ensure a notion of independence among the agents. The allocation and coordination of agents must be a function of the properties of data stream. For example, in this project we considered profiling a users' resource usage in a time sharing system in order to detect anomalies and intrusions. In this context, data can have multiple dimensions collected various system resources (e.g. CPU usage, memory usage) by developing a program that monitors the system continuously over time. This program can run in multiple time sharing machines and over time to construct an accurate signature of a user. Second, on-line or near real time versions of data analysis methods must be designed. For example it is not known how to design sliding window like algorithms for tensor decomposition to analyse data with multiple modes.

In this project, we developed techniques that can analyze three-way data in using a sliding window like decomposition algorithm. The [*]3-way[/*] tensor analysis techniques are developed to find the structure in this data and identify a signature for the resource usage of each user in a collaborative environment which can be used for threat analysis. Distributed versions of these algorithms are straight forward extension since each data collection point can construct a portion of the signature and periodically dump the data to a shared and secure directory. Thus, the project achieved the goals put forward.