Skip to main content
European Commission logo print header

consortium on discovering knowledge with Inductive Queries

Article Category

Article available in the following languages:

From data to knowledge

As researchers find themselves overwhelmed with data, but lack the means to retrieve the information of interest, an innovative approach to knowledge discovery in databases has been explored.

Digital Economy icon Digital Economy

Perhaps 'data explosion' is the most characteristic feature of science at the start of the third millennium. From particle physics to molecular biology, from neurology to astronomy, almost all experimental sciences are experiencing an unprecedented increase in the amount and complexity of available data. Within these databases lies an abundance of scientific knowledge that is accumulated by sophisticated instrumentation, and by ever more powerful information technology. An innovative approach was adopted within the CINQ project to analyse such a vast volume of data which do not constitute information per se and do not allow any kind of easy management. To support the knowledge discovery process, intelligent data mining algorithms were developed to extract knowledge artefacts providing a compact and semantically rich representation of heterogeneous raw data. Looking for a tighter integration between data and knowledge artefacts which hold in the data, CINQ project partners employed the concept of inductive databases. In inductive databases, ordinary queries could be used to access and manipulate data, while inductive queries enabled extraction of patterns such as items frequently appearing together and association rules. Knowledge discovery in inductive databases therefore becomes an extended querying process which the analyst can control by specifying the data or patterns of interest. The quest for the appropriate query language was among the goals of the CINQ project that is being further pursued in the current IQ project, funded under the Sixth Framework Programme. Although many efforts have been devoted to the application of pattern queries for extracting information available in Web pages, the scientific challenges in functional genomics received the CINQ project's attention. Most of the available analysis techniques of gene expression data are based on clustering algorithms that try to establish groups of genes whose expression is correlated in different biological situations. As their biological validity can be questioned, exploratory data mining algorithms that seek descriptive rules in data collected by serial analysis of gene expression (SAGE) or from DNA microarrays were proposed.

Discover other articles in the same domain of application