Skip to main content

Agent-Oriented Distributed Data Mining using Computational Statistics

Final Report Summary - ADMIT (Agent-Oriented Distributed Data Mining using Computational Statistics)

Today's systems for managing critical infrastructure such as traffic, energy, or industry automation systems are highly complex, distributed, and increasingly decentralized. Multi-agent systems (MAS) provide an intuitive metaphor and configurable, robust and scalable methods for problem-solving and control in distributed, decentrally organized systems. In such large-scale software systems, the behavior of individual components depends on information observed from distributed sources. The purpose of Distributed Data Mining (DDM) is to provide algorithmic solutions for data analysis in a distributed manner to detect hidden patterns in data and extract knowledge necessary for decentralized decision making.
In ADMIT, we focus on methods for distributed estimation of parameters for the individual agents, agent communities, and application-level information models. Our approach is based on Computational statistics (CST), which includes a set of methods for approximate solution of statistical problems without complex statistical procedures. The goal of the project is to advance towards an agent-oriented (AO) DDM framework, which includes a set of computationally effective, robust and easy to apply methods for models parameter estimation.
The scientific research objectives of ADMIT are: 1) to develop a conceptual architecture of AO DDM framework as well as a methodology of its usage in the multi-agent programming frameworks; 2) to develop a set of computationally effective and reliable to bad data quality CST-based DDM methods for efficient estimation of the model parameters on the basis of distributed data at different levels of MAS as well as estimate the method performance; 3) to assess the impact of incorporation of the DDM framework to MAS-based applications (focused on traffic and logistics domains).
The results of these objectives were presented by Dr. Fiosina and co-authors at 11 international conferences. 16 project-related papers in peer-reviewed journals and conference proceedings were published during the reported period. To show evidence for the quality and efficiency of the proposed methods the experimental project result were integrated into traffic domain application use-cases and validated using real-world traffic data from the city of Hannover.
The projects achievements (results and impacts) are divided into six results. The first four ones are significant scientific results of the project, the fifth one consists on the steps for the researcher future career development and the last one describes training, integration and transfer of knowledge activities.
A1: A traffic routing problem with decentralized decision making of vehicle agents in urban traffic system was investigated, where the planning process for a vehicle agent is separated into two stages: strategic planning for selection of the optimal route and tactical planning for passing the current street in the optimal manner. A MAS architecture for this problem was developed; data flows and scenarios were analyzed. Necessary CST-based DDM algorithms for comparing two routes in a stochastic graph and the shortest path search were developed, which are carried out at strategic planning stage; formulas for the estimation of the proposed method efficiency were developed. This results were summarized in four publications.
A2: Change Point Analysis for data processing and mining of intelligent agents in city traffic was investigated. The necessary agent-oriented architectures, scenarios and data flows were described. Two CST-based resampling tests for change point detection were proposed, which were implemented at the DDM layer of agent logics. Two traffic scenarios were considered. This results were summarized in two publications.
A3: We focused on travel time forecasting problem, introducing decentralization by development of MAS architecture. As forecasting models, we investigated decentralized linear and kernel-based multivariate regression. To be capable to work with streaming data, the linear regression model was strengthened with iterative least square parameter estimation method. For synchronization of the distributed model parts, we proposed the original resampling-based consensus method. These results were summarized in four publications.
A4: A problem of decentralized kernel-density based clustering was considered and appropriate MAS architecture was developed. Two decentralized clustering models were proposed. The first model is non-parametric based on kernel density functions, were the extreme were found using iterative expectation maximization algorithm. To synchronize the distributed model parts the agent cooperation based on transmission of selected data points was assumed. The second model is semi-parametric based both on kernel-density functions and an approximated density of multivariate normal distributions. The expectation maximization step remains the same, but the cooperation procedure differs. Instead of data point the estimated parameter values of a mixture of multivariate normal distributions are transmitted. These results were described in three publications.
A5: A major result of the project was the development of a strategic personal research agenda with focus on Big Data analysis in conjunction with the use of cloud resources in the domain of traffic information systems. Initial work towards this agenda was performed and several papers published describing challenges and solution approaches in this area. A research grant proposal ANACONDA (Decentralised Big Data Analysis in Complex Networked Applications) was prepared and submitted to Deutsche Forschungsgemeinschaft (DFG) in July 2014. If successful, this grant will supply the research funding for three years and play a crucial role in helping Dr. Fiosina to complete her Habilitation.
A6: (Training /integration/ Transfer of knowledge activities)
• After attending intensive German language courses Dr. Fiosina obtained the certificate of German language knowledge, which is equivalent to C2 (professional user level).
• She improved her English language skills by attending language courses;
• She improved her simulation skills, attending two post-graduate level courses ‘Stochastische Simulation’ and ‘Diskrete ereignisorientierte Simulation mit ExtendSim’ organized by Doctoral School of Operations Management & Research of the Lower Saxony Technical University:
• Dr. Fiosina has taught a module on ‘Distributed Data Analysis’ (32 hours) as a part of the graduated and PhD students’ course ‘Distributed Data Analysis and Machine Learning’.
• She has been involved in the scientific and educational coordination of the inter-university e-learning project ALTANTIS,
• She participated in the preparing of the scientific project proposal titled ‘Robotic Fire Fighters’, which obtained financing from the Lower Saxony Technical University.
• Dr. Fiosina gave several talks presenting the progress and results of the ADMIT project in MEC Group at TU Clausthal. She gave also an invited talk on 4.11.2011 ‘An Overview of Resampling Methods for Analysis of Stochastic Systems’, during her visit to the Institute of Mathematics of Potsdam University initiated by Prof. Dr. habil. Hannelore Liero.
• Dr. Fiosina improved her networking capabilities to foster contacts among people working in correlated areas of interest across multiple location and institutions inside and outside TU Clausthal. She made various contacts attending int. conferences and using MEC Group research collaborations. She established cooperation with e.g. University of Potsdam, University of St. Petersburg and University of Sidney.
• She has served as a reviewer of five international conferences,

Dr. Jelena Fiosina’s contact data: Clausthal University of Technology, Institute of Informatics, Julius-Albert-Straße 4, Raum 219, e-mail: jelena.fiosina@gmail.com
ADMIT Website: http://www.in.tu-clausthal.de/personen/aktuelle/dr-jelena-fiosina/admit/