In the BOOTStrep project, we aim at building reusable wide-coverage lexical and conceptual repositories for the biology domain. On the one hand, we want to exploit already existing terminological resources (thesauri, classification systems) and combine them within a common representation framework. On the other hand, the gaps we encounter shall be filled by automatically acquiring new terms and concepts from the vast amount of literature in the field. In addition, we extract information about the linguistic properties of biology terms, which is needed before applications such as text mining can deliver high quality results of real interest to users.
We shall define, implement and develop a lexicon, which covers lexical forms and their relevant linguistic information from the biology domain. Similar work will be carried out to arrive at an associated domain ontology. While still under development, the lexical resource can already be exploited to feed a sub language text analyser, which recognizes unknown terms, their attributes and semantic relations they have to other already known terms. This procedure leads to the incremental enhancement of the initial lexicon and of the initial domain ontology. Simultaneously, the text analyser finds concrete biological facts in the documents being processed, which are stored in a biological fact repository. Both the facts and the domain ontology form a comprehensive, continuously growing biological knowledge repository.
Such an environment bridges work on heterogeneous forms of biological terminologies, lexicons and ontologies. It is a major step towards increased semantic interoperability for all actors involved in the biology domain (scientists, clinicians, bio tech industry and business). To further ease access to the growing body of factual biological information we also supply a cross lingual query interface (for English, German, Italian and French). The project will deliver reusable large-scale resources and tools.
Fields of science
Funding SchemeSTREP - Specific Targeted Research Project