Servizio Comunitario di Informazione in materia di Ricerca e Sviluppo - CORDIS

Construction, augmentation and use of knowledge bases from natural language documents

The COBALT demonstrates how different state-of-the-art language engineering technologies can be integrated to build a system supporting a better exploitation of information by means of a fine categorization of natural language texts. The technical approach is based on the integration of shallow, pattern-matching based, analysis techniques, with syntactic and semantic analysis of natural language texts, supported by a knowledge representation language. Shallow analysis is used for an initial, broader categorization and provides speed and focusing capabilities while integration with natural language processing (NLP) techniques provides an in-depth understanding of text contents and information extraction capabilities. Knowledge representation (KR) techniques provide the glue for integration as well as the foundation support for knowledge aware applications.

Project results include an empty categorization shell, provided with generic tools as well as with linguistic resources for dealing with English texts, a methodology for building specific applications and a demonstrator aimed at providing an evaluation testbed for the chosen approach. The demonstrator is a filtering application for financial news which can categorize news distributed by a Reuters' datafeed and make it available for routing and retrieval. The end users of the prototype are analysts and brokers in the stock market sector of an Italian merchant bank, which took an active part in the definition of the demonstrator requirements as well as in its evaluation. The evaluation performed proved the approach sound and valuable, and the demonstrator represents a good basis for the implementation of an industrial strength prototype.


Gianluigi ROCCA, (Account Director)
Tel.: +39-02-58302712
Fax: +39-02-58305374