The project aims at new techniques that give decision-makers direct access to information stored in databases, data warehouses, and knowledge bases. The main goal is the integration of data and knowledge management. Discovery techniques produce knowledge from very large sets of distributed data. They exploit domain knowledge in order to deliver more concise and relevant insights. The main obstacle to achieve this goal is the problem of finding the proper representation for a discovery task.
The project will develop new techniques that support user-guided representation adjustment as well as techniques that automatically select or change representations. Cases of successful uses of particular representations for certain discovery tasks are stored and provide users with an adaptive interface to information. An advanced data mining system, a case-base of typical discovery tasks, and new operators for pre-processing will be the project's results.
An environment for knowledge discovery from databases (KDDE) will be developed that provides decision-makers with advanced knowledge extraction form large distributed data sets. New techniques for selecting and constructing features on the basis of given data will be developed. For instance, ways of handling time (time series, relations of time intervals, validity of discovered rules), discovering hidden variables, and detecting interdependencies among features will be investigated. The techniques ease knowledge discovery where currently most time is spent in pre-processing. Domain knowledge will be exploited by data mining. This will enhance the quality of data mining results. A case-base of discovery tasks together with the required pre-processing techniques will offer an adaptive interface to the KDDE. This will speed-up similar applications of knowledge discovery and make the KDDE self-improving.
The scientific research for enabling end-users to gain knowledge from databases and data warehouses is organised in two themes: a meta-data model and multi-strategy learning. The meta-data offer constraints for pre-processing and pairing business tasks with algorithms (WP1, WP8, WP10, WP18). A deep analysis of feature selection, sampling, transformation and mining operators is developed. Multi-strategy learning systematically explores the combinations and (automatic) parameter settings of diverse learning operators for pre-processing, particularly for feature selection and construction (WP4, WP13, WP14). Handling of multi-relational data (WP15), time phenomena (WP3) and the inclusion of domain knowledge (WP5) enhance discovery. The technological achievement is centred around an advanced KDD supporting environment (WP1, WP2, WP7, WP12, WP16). Scientific and technological efforts yield a case base of best-practice discovery (WP10) that can be used by users of the environment and is published in the Internet for an international "representation race".
Applications guarantee that research and technology focus on the most challenging and demanded issues. The data warehouse provided by one partner and a set of data mining applications from data mining consultancy of two of the partners evaluate the transferability of results.
- Milestone 1 delivers a first prototype of a KDD support environment, application areas being set up and their demands being specified.
- Milestone 2 delivers multi-relational data handling and learning the setting of learning parameters. In addition, meta-data for user-driven data transformations and known learning operators will be implemented by an environment for pre-processing.
- Milestone 3 delivers new methods for automatic pre-processing and a case-base that is used by the KDDSE.
Funding SchemeCSC - Cost-sharing contracts
3800 GG Amersfoort
130 67 Praha