Skip to main content

Discovery of content descriptors for documents


Research objectives and content
In the project, methods for discovering content descriptors, based on the text of documents, are developed and evaluated. Typical examples of content descriptors are keywords. Representative content descriptors are useful, even required, in many application areas, particularly in information retrieval. Extracting of keywords has been studied for a long time. However, the recent development has changed the situation: the appearance of huge document collections for the use of everyone sets new requirements for the methods. In this project, so-called data mining methods are studied and a special objective is to discover complex, structured content descriptors, e.g. phrases, and evaluate their usability in information retrieval.
Training content (objective, benefit and expected impact)
The home department has active Data Mining and Document Management research groups, whereas there is no tradition in information retrieval research. Hence, the training in standard and state-of-the-art information retrieval techniques and in conducting experiments is the key benefit. This combined expertise is invaluable to the new, emerging field of data mining in text.
Links with industry / industrial relevance (22)
The results of the project will be utilized within a project called Structured and Intelligent Documents at the University of Helsinki. The industrial partners of the project include major publishing and media houses in Finland, e.g. Aamulehti Group and Edita.


Eberhard-Karls-Universität Tübingen
10,Auf Der Morgenstelle
72076 Tübingen

Participants (1)

Not available