Construction, augmentation and use of knowledge bases from natural language documents

Objective

COBALT is concerned with the problem of capturing factual and definitional knowledge from machine readable textual sources for assertion in an existing knowledge base. The general aim of the project is to demonstrate how different state-of-the-art natural language engineering technologies and pieces of software can be integrated to build a system supporting a better exploitation of information in the financial domain, through enhancements of a knowledge base. A message routing application will be built to demonstrate one of the possible uses of the tools and techniques developed in the project.

The technical goal is to improve the performance of off-the-shelf text categorisation systems by integrating current text categorisation techniques with state-of-the-art knowledge representation and selected natural language parsing and understanding techniques. The project will exploit existing technology, research results and software modules and concentrate R&D efforts on integration issues.

The idea is thus to achieve some European innovative results on text categorisation using the results of leading technology in this field and on the state-of-the-art technology in NLP. An Interest Group composed of major Italian banks will be set up in the course of the project to provide input and feedback.

The general background of the project is the widely acknowledged problem of setting up, augmenting and using large knowledge based systems (LKBSs), due to the impossibility of manually encoding all the information to be stored (the so-called knowledge acquisition bottle-neck).

The project is based on a double assumption:
for many LKBSs applications, a great deal of the necessary knowledge already exists as printed or computer-readable texts;
the state of the art in the artificial intelligence (AI) and computational linguistics (CL) fields allows the possibility of (semi-) automatic processing of such texts to translate basic semantic content into a KRL suitable for many different high-level LKBSs applications.

The prototype COBALT system will classify and store each new item in a KB according to a defined hierarchical structure of categories to be used for application specific storing, summarising and retrieval tasks. From a functional point of view the prototype will thus belong to the class of Text Categorisation systems. The basic idea is to exploit text categorisation for a first-level broad categorisation of items and for selecting relevant text portions to be analysed later with natural language understanding techniques (parsing and semantic analysis). The combined results of the two analyses will enhance the original KB and thus constitute the basis for a very accurate, second-level category assignment activity.

A running demonstration will be developed, in the first phases of the project, evolving to further levels of complexity in an incremental life cycle style. The basic language for the prototype system will be English. A feasibility study for the adaptation of the prototype to other languages and other application domains is envisaged within the project.

Currently there is no significant presence of the European IT industry in the text categorisation technology and its applications: the main products come from the USA. COBALT will start from state of the art results and extend them in terms of new technologies, and greater benefits and functionality.

The main results of COBALT will be:

a practical test of how the current state of the art approaches in NLP and AI technology can support the transfer of knowledge recorded in natural language texts (financial domain) into knowledge bases.
the definition and prototypical implementation of a basic technology for advanced text categorisation tasks.

Experimentation in the field of online financial news intelligent routing will be carried out; thus we can expect that the R&D activity in COBALT will be directly exploited in the realisation of a new generation of intelligent routing applications. The basic technology, however, will be able to support the development of a large set of very interesting applications based on text categorisation. Some potential application areas are:

automated information filtering and routing, as an extension to domains other than the financial one for people and companies using information from news wire feeds or generic text as well;
text classification, for information vending services, both automatically or interactively with domain experts;
large archives and databases intelligent navigation, for example CD-ROM navigation with simple hypertextual capabilities derived from a KB description and structuring of the CD contents.

Quinary plans to exploit the project results in two ways:

products development: the new COBALT product on information routing will be added to the already developed Quinary products line in banking.
technology transfer services: the project results are a natural extension of the collection of technologies Quinary is able to support in consulting, training and systems development services.

UMIST will exploit results internally to enhance its teaching of natural language processing and knowledge engineering techniques and as a background technology for future research and development projects as in robust text processing. STEP Informatique envisages the possibility of an enhancing of the Legal Advisory Systems developed in the ESPRIT II NOMOS Project (in which it is a partner) by making use of COBALT derived techniques, and is ready to take part in the creation of a commercial product of this type to be engineered from the results of the COBALT project.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

FP3-LRE - Specific programme of research and technological development (EEC) in the field of telematic systems in areas of general interest - Linguistic research and engineering -, 1990-1994

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Data not available

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Data not available

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Data not available

Coordinator

Quinary SpA

EU contribution

No data

Address

Via Crivelli 51/1
20121 Milano
Italy

Total cost

No data

Participants (2)

Step Informatique

France

EU contribution

No data

Address

20 rue Martel
75010 Paris

Total cost

No data

University of Manchester Institute of Science and Technology (UMIST)

United Kingdom

EU contribution

No data

Address

Sackville Street
M60 1QD Manchester

Total cost

No data

Objective

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (2)

Share this page Share this page on social networks

Download Download the content of the page

Construction, augmentation and use of knowledge bases from natural language documents

Objective

Fields of science (EuroSciVoc) CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s) Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s) Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (2)

Share this page Share this page on social networks

Download Download the content of the page

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.