Reduction of Noise and Silence in Full Text Retrieval Systems for Legal Texts

Objective

The problems of accessing texts in a large textual database are not restricted to the legal world, although they are as acute here as in most other spheres of activity. Existing retrieval mechanisms depend upon the user being able to formulate his query in the form of strings of keywords included in an inverted list (a primitive index), which restrics usage of the texts as it ignores some properties of the legal sublanguage.

The RENOS project aims to develop software modules capable of being integrated into existing Full Text Retrieval Systems (FTRS) which will reduce the levels of "noise" and "silence" of such systems when applied to legal texts. "Noise" is defined as the retrieval of texts of little or no relevance to user queries, while "silence" is defined as failing to retrieve relevant texts from the database. The software modules will implement a semi-automatic methodology for identifying legal terms (single-word and compound terms) in legal texts originating from several European member states by statistical means and by morphological and linguistic analysis.

Approach and Methodology

The approach adopted in the project is the creation of an "intelligent inverted list", which comprises a lexicon of single-word and compound terms, a hierarchically arranged conceptual network and a constituent grammar. Lexicon entries will be linked to nodes in the network and these nodes - "concepts" - will form the basis of text retrieval. Constituent grammars will offer linguistic criteria for identification of compound terms and ambiguous terms, i.e. words used both as a legal term and in the general language meaning.

The lexicon will contain a framed representation of single-word and compound legal terms, which will be stored by their stems together with pointers to inflectional patterns. Nodes in the conceptual network will consist of semantic classes pertaining to legal terms - "concepts" - organized in a tree structure. Pointers from lexical entries to the concepts in the network will be established, synonymous terms pointing to the same node. The constituent grammar will contain rules for the identification of compound terms and disambiguation of the meaning (legal or general) of single word terms in context.

The components of the network will be manually built in the prototype system, following automatic extraction of an initial set of terms from a corpus of legal text containing legislation common to Community countries. Part of the software to be built will establish the links between network concepts (nodes) and the corpus by applying grammar rules on appropriate corpus segments. Another part will implement a mini Text Retrieval System using the Intelligent Inverted List, demonstrating its benefits over traditional methods of text retrieval. Evaluation stages will quantify the performance of the RENOS system with respect to existing FTRSs.

Exploitation and Future Prospects

The end result of the RENOS project will be a piece of software which, with some additional development work, may support a multilingual FTRS, and the two private companies in the consortium, both legal information providers, plan to exploit this directly. Databank S. A. will explore the possibilities of incorporating tools and methodologies in the NOMOS database, and SOGEI will similarly attempt to integrate the conceptual legal term network into some of its existing products and services.

The collection of legal terms in three European languages is a key feature of the project, together with the evaluation and refinement of automated tools for the acquisition of terminological resources by statistical means. Extension to other languages and subject areas (engineering standards, medical texts) is envisaged.

Incorporation of the intelligent inverted list demonstrated in RENOS into existing FTRSs will greatly improve their query mechanisms, and the RENOS system could eventually be directly commercialized via direct sales to text retrieval companies and information providers.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

FP3-LRE - Specific programme of research and technological development (EEC) in the field of telematic systems in areas of general interest - Linguistic research and engineering -, 1990-1994

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Data not available

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Data not available

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Data not available

Coordinator

Databank S.A.

EU contribution

No data

Address

124, Kifissias Ave. & Iatridou St
11526 Athens
Greece

Total cost

No data

Participants (5)

CEF Management Research Centre

Denmark

EU contribution

No data

Address

Total cost

No data

INTRASOFT S.A.

Greece

EU contribution

No data

Address

Total cost

No data

Institute for Language and Speech Processing (ILSP)

Greece

EU contribution

No data

Address

Total cost

No data

Istituto di Linguistica Computazionale

Italy

EU contribution

No data

Address

Total cost

No data

Società Generale d'Informatica SpA

Italy

EU contribution

No data

Address

Total cost

No data

Objective

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (5)

Share this page Share this page on social networks

Download Download the content of the page

Reduction of Noise and Silence in Full Text Retrieval Systems for Legal Texts

Objective

Fields of science (EuroSciVoc) CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s) Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s) Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (5)

Share this page Share this page on social networks

Download Download the content of the page

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.