Servizio Comunitario di Informazione in materia di Ricerca e Sviluppo - CORDIS

Final Activity Report Summary - ADMIRE (A distributed data mining infrastruture)

Apart from the challenges of the data mining approaches, the project presented two main achievements that were quite related.

The first was the design, development and implementation of distributed data mining techniques that could take full advantage of distributed platforms, while minimising the overheads due to communications and synchronisations. These techniques did not only perform the task faster, as more processing power was provided by the distributed systems, but could also solve the problem of data heterogeneity given that each node processed its data locally, under the assumption that the data located on each processor was more likely to be homogeneous. The second challenge was how to design an infrastructure that would allow the user to both develop distributed data mining techniques and run them on a distributed system without him, or her, intervening at low level details.

We developed and implemented an innovative approach to deal with the second, and partly with the first, challenge. We defined a distributed knowledge map that did not only help the users collecting and interpreting their results, but it also helped to aggregate the results of the distributed data mining techniques. We run several experiments for comparing the mining techniques with and without knowledge map and they showed that there was a significant gain in performance and accuracy of the results.

The ADMIRE architecture included two main layers, namely a core and a virtual data grid layer. The ADMIRE core layer was composed of three parts, i.e. knowledge discovery, task management and data or resource management. The role of the first part was to mine and integrate the data and discover global knowledge. The second part contained three modules:

1. data pre-processing
2. distributed data mining (DDM) with two sub components, i.e. local data mining (LDM) and integration and coordination
3. knowledge map.

The first module dealt with data cleaning, data transformation, data reduction, data projection, etc. The specific characteristic of ADMIRE in comparison to other systems was that different mining algorithms were used in LDM to deal with different kinds of data. Local results were integrated and coordinated to produce global models. The results of LDM should be collected and analysed by domain knowledge. This, i.e. the knowledge map, was the role of the last module, which generated significant interpretable rules, models and knowledge. Moreover, the knowledge map controlled the entire data mining process by proposing different strategies for mining as well as for integrating and coordinating the results for improved performance. More details could be found in relevant papers that were published by the project consortium.

The upper part of this layer, called virtual data grid, was a portable layer for data grid environments. The use of a set of grid services provided some benefits. For instance, the developers did not waste time for dealing with heterogeneous organisations, platforms, data sources, etc. Software distributing was easier. In order to render ADMIRE more portable, and more flexible with regards to many existing data grid platforms, we built this portable layer as a virtual grid platform. It provided a general services operation interface to upper layers. It homogenised different grid middleware by mapping data mining tasks to grid services according to the open grid services architecture/web services resource framework (OGSA/WSRF) standard or to entities in the DGET model. By using this portable layer, ADMIRE could be easily transferred to many kind of data grid platforms, such as Globus or DGET.

Reported by