Community Research and Development Information Service - CORDIS

Prototype tool for data mining tasks

A number of tools and techniques have been developed in order to deal with large volumes of data and discover pertinent information and knowledge from large data repositories. Unfortunately current approaches in data mining applications under-address the most important requirement, that of revealing and handling uncertainty in the context of data mining tasks. The current research has implemented a prototype tool for data mining tasks that addresses the problems of quality assessment and uncertainty handling.
Prototype tool for data mining tasks
The prototype tool developed by the current research project is a client/server system, implemented in Java and connected to a data set. The tool, the outcome of a project named UMINER, handles uncertainty and guarantees quality when conducting the main tasks for data mining processes. Data mining systems are widely used particularly in electronic banking, insurance services and for general information network management.

A partitioning of the relevant dataset precedes extraction of data from large databases. Partition of the data set is in simple terms the division of the original huge set into labelled subsets of smaller volume. Until now the partitioning of a data set was defined by a clustering algorithm based on a-priori selected number of clusters or subsets. In contrast the UMINER tool defines the optimal number of clusters for the given data set. Research partners have developed a methodology for the extraction of optimal clustering schemes based on established and known clustering algorithms in conjunction with quality measures that optimise the produced clustering scheme.

Optimal clusters that have been defined using the preceding clustering process form categories maintaining the classification belief. Fuzzy logic is being used for the representation and manipulation of this belief. Therefore, there is now a scheme for classifying non-categorical attribute values into categories that maintain the classification belief.

Another innovative feature of the proposed data-mining tool is that it provides information measures for decision support. These measures for the classification scheme are based on the energy metric function. This function reflects the information quantity included in a fuzzy set. The information measures provide a basis for extraction of “useful” knowledge, with which reasoning and decision-making can be made. Supporting uncertainty in terms of belief by enhancing data mining processes is the most important advantage the system offers.

At the current stage of development, project partners have designed and developed the main steps of this new approach. A clustering, classification and association rules extraction framework for large relational databases that handles uncertainty in terms of belief measures in data mining tasks. In the near future new modules for the system will be developed and the clustering and classification rules will be integrated to a fully functional innovative data mining system.
Record Number: 80336 / Last updated on: 2005-09-18
Domain: IT, Telecommunications