Skip to main content
Go to the home page of the European Commission (opens in new window)
English en
CORDIS - EU research results
CORDIS
Content archived on 2022-12-23

GRAMLEX

Objective



The first step in processing a text or corpus in any natural language is usually lexical tagging. It is probably the most basic and the most general-purpose operation in natural language processing. The quality and even the feasibility of further treatments depend on the quality of lexical analysis. The data and algorithms needed to achieve this task with a correct accuracy on unrestricted text include: forms or lemmata with a formal characterization of their morphological variations; the relevant lexical are morphological ones, i.e. homographs are represented in one lexical unit; characterization of their morphological variations; produce tags that encode their lexical category.

There is thus a strong need for electronic language resources and engineering standards in the morphology of European languages. The aim of GRAMLEX is to facilitate the initiation, coordination and standardisation of the construction of morphological dictionary packages for the essential part of several European languages, including detailed formal description of the morphology of the languages. The major challenges in such an enterprise are to give the description the largest possible coverage, in order to be able to process unrestricted text; to share as much as possible of the formats, methods and algorithms; and to improve time and space efficiency of programs.

Our approach is to tackle in parallel several aspects of the problem:
The contents, form and use of morphological lexical data. The form of lexical data will be examined and evaluated according to several criteria, including their use for generation and recognition and their interest for standardisation. The contents of dictionaries will be confronted with text corpora. The analysis of tokens not recognized will bring about feedback on that contents. In order to take account of the specific features of technical texts (e.g. in terminology, multilingual structuration of dictionaries is by nature easier than in general language), the terminology of telecommunications was chosen as an application field. The use of grammatical information for lexical disambiguation will be tested. A feedback on the grammatical contents of the tags is expected.

The four languages of the project, namely French, Hungarian, Italian and Polish, make up a benchmark for the coordination and standardization of methods and data in closely related, less closely related, and unrelated languages. The co-operation with the Hungarian partners will be a first tentative of comparing RELEX methods, initially devised for Indo-European languages, with methods used for a non Indo-European language with a very different morphological system.

The project will produce lexical resources designed for computer applications on unrestricted text, including technical texts. These resources will be available for research projects and other activities. By increasing their know-how and knowledge in lexical resources, the participants will promote the commercial interest of such resources.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

This project has not yet been classified with EuroSciVoc.
Be the first one to suggest relevant scientific fields and help us improve our classification service

You need to log in or register to use this function

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Data not available

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Data not available

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

CSC - Cost-sharing contracts

Coordinator

Association pour le Traitement Informatique des Langages Formels et Naturels (ASSTRIL)
EU contribution
No data
Address
Place Jussieu 2
75251 Paris Cedex 5
France

See on map

Total cost

The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.

No data

Participants (5)

My booklet 0 0