Conceptual Retrieval of Information using Semantic dicTionary in three Languages

Objective

The CRISTAL project addresses the area of text retrieval and indexing. The project will develop a multilingual (French, English and Italian) natural language interface in order to retrieve monolingual (French) text in a corpus of newspaper articles. The system will integrate linguistic methods and information retrieval techniques.

The goal is to provide access to textual information by matching query and text concepts rather than by string or keyword matching. Thus the project will provide the ability to search for an idea, without requiring knowledge of the texts examined or mastery of a cryptic query language. The project will reuse an existing conceptual dictionary, Dicologique, as well as results from other projects: SIMPR, PLUS and COBALT. The project will be carried out along two development axes.

1. The adaptation of the French conceptual dictionary, Dicologique, a device that maps natural language lexemes into concepts. This involves an expansion of the structure to accommodate multilinguism (English and Italian) and the semantic analysis of English and Italian subsets. The necessary software tools to consult and update the conceptual dictionary will be built during the project.
2. The development of a concept-based information retrieval environment that includes an indexing module, a search engine and a dialogue management module. The interface will first accept a natural language query and then refine and disambiguate this query through a dialogue with the user. Concept based retrieval will be investigated.

The consortium is composed of industrial partners, research organisations and a user. The user will provide real corpora and will participate to the requirement specification. The project will demonstrate and evaluate the techniques and tools through final end-user applications.

Approach and Methodology

The conceptual dictionary Dicologique is composed of a multi-hierarchical tree structure where each leaf corresponds to a word or a phrase. A variety of types are used to characterise the nodes. These types enable grouping of concepts by topics, building of links (IS_A, SORT_OF, PART_OF...) linking near-synonyms, and grouping concepts by characteristics (size, shape...)

The conceptual dictionary will be extended to multilinguism by giving English and Italian equivalents for each concept of the studied subset, on a 1 to 1 basis. Where exact word equivalents are not available, phrases will be used. These direct links between words will avoid creating three isomorphic semantic structures for the three languages used in the query input.

The parser will comprise a morphological and a syntactic analyser and an interpretation component. The main purpose of the parsing is to disambiguate syntactic senses of words in the document texts as well as in the natural language queries. The interpretation component of the parser will then map the syntactical output to the conceptual dictionary. The approach to solving ambiguities will rely whenever possible on the context of the document or the context of the query conversation, otherwise it will be solved by questioning the user. The dialogue manager will be simplified compared to other systems through constraining the expected user response. The Esprit project PLUS demonstrator is the starting point of the dialogue module.

To enable multilingual access to the text database, the documents will be indexed monolingually but queries will be processed multilingually. The concepts extracted from the query are substituted by their target (French) equivalents, which are then used in the indexing process. A formal notion of semantic distance will be defined during the project and a threshold will enable too distant concepts in the matching process to be filtered out.

Exploitation and Future Prospects

The project is carried out by an industrially based consortium and the coordinator has a solid reputation. A follow-up of the project could turn the prototype into a commercial product. The project aims at a generic application that provides electronic access to current information. Access to remote data-banks over network services such as Minitel "information kiosks" and direct access to bulk data distributed on CD-ROM are potential applications. The approach is domain independent and the system could also be adapted for public information suppliers or for engineering purposes (technical documentation, maintenance manuals, test reports, ...)

The major improvement compared to off-the-shelf products results from the combination of:

1. multilinguality; the user is able to access information in a foreign language without needing a perfect knowledge of that language,
2. the ability to access information in free natural language,
3. the ability to search for an idea as opposed to keyword matching.

The project expects scientific results in the fields of man-machine communication, dialogue management and conceptual dictionary building. It will test the theoretical models developed during the previous years in a concrete commercial domain. Cooperation is foreseen with other LRE indexing and information retrieval projects.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

FP3-LRE - Specific programme of research and technological development (EEC) in the field of telematic systems in areas of general interest - Linguistic research and engineering -, 1990-1994

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Data not available

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Data not available

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Data not available

Coordinator

Cap Gemini Innovation

EU contribution

No data

Address

86/90 rue Thiers
92513 Boulogne-Billancourt
France

Total cost

No data

Participants (5)

Cap Volmac

Netherlands

EU contribution

No data

Address

Rijswijk

Total cost

No data

Consiglio Nazionale delle Ricerche (CNR)

Italy

EU contribution

No data

Address

Via della Faggiola 32
56100 Pisa

Total cost

No data

L'Europeenne de Donnees

France

EU contribution

No data

Address

Boulogne

Total cost

No data

Memodata

France

EU contribution

No data

Address

Caen

Total cost

No data

University of Manchester Institute of Science and Technology (UMIST)

United Kingdom

EU contribution

No data

Address

Sackville Street
M60 1QD Manchester

Total cost

No data

Objective

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (5)

Share this page Share this page on social networks

Download Download the content of the page

Conceptual Retrieval of Information using Semantic dicTionary in three Languages

Objective

Fields of science (EuroSciVoc) CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s) Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s) Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (5)

Share this page Share this page on social networks

Download Download the content of the page

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.