LANGUAGE ENGINEERING - PREPARATORY ACTION FOR LINGUISTIC RESOURCES ORGANIZATION FOR LANGUAGE ENGINEERING | LE-PAROLE | Projekt | Arkusz informacyjny | FP4 | CORDIS

Informacje na temat projektu

LE-PAROLE

Identyfikator umowy o grant: LE24017

Projekt został zamknięty

Data rozpoczęcia 1 Kwietnia 1996

Data zakończenia 31 Marca 1997

Finansowanie w ramach

Specific programme of research and technological development and demonstration in the area of telematic applications of common interest, 1994-1998

Koszt całkowity

€ 2 786 520,00

Wkład UE

€ 1 996 260,00

1 996 260,00

790 260,00

Koordynowany przez

Università degli Studi di Pisa
Italy

Cel

LE-PAROLE is concerned with a large-scale harmonised set of text databases (corpora) and lexica for all EU languages. These resources will have a wide range of applications, including design and testing in information technology, the production of language learning material and academic research. Each 20,000-entry lexicon will be based on a software tool extended to support both the conversion and management processes of the resulting resources. The project will produce large monolingual harmonised corpora which obey common markup conventions and are compatible with the lexicons.
Progress
During the first 9 months of activities the project has successfully started the creation of the corpora and the lexicons foreseen for the different European languages.
For each of the following languages a corpus of at least 20 million words and a lexicon of 20,000 lemmas will be produced: Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish (lexicon only), Swedish. In addition, a corpus of respectively 20, 15, 3 million words will be produced for Belgian-French, Irish, Norwegian.
Corpora
Permission for use has been obtained from the copyright holders (publishers, newspapers, etc.). The conversion of texts from the source format to the TEI/EAGLES CES based PAROLE format is proceeding according to schedule. On average, 5-6 million running words are already available for the various languages. Semi-automatic tagging of part of the corpora is also regularly underway.
All the information explicitly represented in the source texts is encoded following essentially the CES (Corpus Encoding Standard) designed by EAGLES, on the basis of the TEI guidelines. 250,000 running words will be tagged at the morpho-syntactic level, following the EAGLES guidelines, instantiated by each PAROLE partner for his own language.
Each partner uses, in order to construct, mark-up, tag the corpus, a software package of its choice. The compatibility and interchangeability of the various corpora is ensured by the adoption of commonly defined criteria for composition, encoding and linguistic annotation.
Lexicons
The choice of the 20,000 lexical entries which will form the initial nucleus of the lexicons to be developed in the different countries has been performed. The morphological encoding has been almost completed for all the languages. The encoding at the syntactic level has started producing the initial SGML files ready to be loaded (imported) through a common filler into the common PAROLE lexical DB.
The PAROLE lexicon model is based on the results of LRE EAGLES and EUREKA GENELEX. Thanks to this, all the lexical resources being developed are declarative, theory and application independent, multifunctional and will be able to evolve easily, for example to incorporate other levels of information or to become multi-lingual. This approach which answers to the requisite of genericity, explicitness, and variability of granularity, will guarantee a large scale reusability. The model, with a high level of precision in the description, is in fact designed to ensure that application dependent models of data and applicative dictionaries can be derived from this repository of information, by mapping the application model from the generic one. The coverage is 20,000 entries per language described at the morphological and syntactic levels, and in few cases at the semantic level.
The availability of rather large, uniformly structured lexical resources in all the languages mentioned above will offer the users the benefits of a standardised base.
The exchange format for the lexicons, as for the corpora, is SGML: all the lexicons share the same DTD for the morphological and syntactic layers. Moreover, the use of a common set of lexicon management tools is a guarantee that all lexicons will fully conform to the model. The use of these tools is a precondition of an industrial level of quality for the volumes of data (in so many languages) that PAROLE is to deliver.
The Way Ahead
The work to create lexicons and corpora is now continuing 'à regime'. During 1997 the Consortium will continue the production of LR. The first drafts of guidelines for encoding corpora and lexicons (user manuals) will be prepared. These guidelines, together with the availability of data encoded according to EAGLES/TEI standards, will concretely contribute to the dissemination of this standards. The validation phase will also begin, in co-operation with ELRA.
Availability
All the lexicons will be publicly available, at conditions to be determined within the project. Each corpus will be accessible via INTERNET. A subset of 3 million words of each corpus (including the tagged words) will also be 'distributable': i.e. a physical copy of it can be given to the users. Co-operation with ELRA will be sought to this end. Restrictions on the type of usage will depend on the restrictions imposed by the holders of the copyright of the source texts, when they have authorised the inclusion of their texts in the corpus.

Dziedzina nauki (EuroSciVoc)

Klasyfikacja projektów w serwisie CORDIS opiera się na wielojęzycznej taksonomii EuroSciVoc, obejmującej wszystkie dziedziny nauki, w oparciu o półautomatyczny proces bazujący na technikach przetwarzania języka naturalnego. Więcej informacji: Europejski Słownik Naukowy.

Program(-y)

Wieloletnie programy finansowania, które określają priorytety Unii Europejskiej w obszarach badań naukowych i innowacji.

FP4-TELEMATICS 2C - Specific programme of research and technological development and demonstration in the area of telematic applications of common interest, 1994-1998

Temat(-y)

Zaproszenia do składania wniosków dzielą się na tematy. Każdy temat określa wybrany obszar lub wybrane zagadnienie, których powinny dotyczyć wnioski składane przez wnioskodawców. Opis tematu obejmuje jego szczegółowy zakres i oczekiwane oddziaływanie finansowanego projektu.

D.12 - Language Engineering

Zaproszenie do składania wniosków

Procedura zapraszania wnioskodawców do składania wniosków projektowych w celu uzyskania finansowania ze środków Unii Europejskiej.

projects.no_data

System finansowania

Program finansowania (lub „rodzaj działania”) realizowany w ramach programu o wspólnych cechach. Określa zakres finansowania, stawkę zwrotu kosztów, szczegółowe kryteria oceny kwalifikowalności kosztów w celu ich finansowania oraz stosowanie uproszczonych form rozliczania kosztów, takich jak rozliczanie ryczałtowe.

CSC - Cost-sharing contracts

Koordynator

Università degli Studi di Pisa

Wkład UE

Brak danych

Adres

Via della Faggiola 32
56100 Pisa
Włochy

Koszt całkowity

Brak danych

Uczestnicy (14)

CENTRO DE LINGUISTICA DA UNIVERSIDADE DE LISBOA

Portugalia

Wkład UE

Brak danych

Adres

AVENIDA 5 OUTUBRO
1050 LISBOA

Koszt całkowity

Brak danych

DET DANSKE SPROG -OG LITTERATURSELSKAB

Dania

Wkład UE

Brak danych

Adres

18A,FEDERIKSHOLMS KANAL
1220 COPENHAGEN K

Koszt całkowity

Brak danych

FUNDACION BOSCH GIMPERA UNIVERSITAT DE BARCELONA

Hiszpania

Wkład UE

Brak danych

Adres

BARCELONA

Koszt całkowity

Brak danych

GOETEBORGS UNIVERSITET

Szwecja

Wkład UE

Brak danych

Adres

6,RENSTROMSGATAN
GOTHENBURG

Koszt całkowity

Brak danych

GSI-ERLI

Francja

Wkład UE

Brak danych

Adres

1,PLACE DES MARSEILLAIS
94227 CHARENTON

Koszt całkowity

Brak danych

INSTITIUID TEANGEOLAIOCHTA EIREANN

Irlandia

Wkład UE

Brak danych

Adres

FITZWILLIAM PLACE
2 DUBLIN

Koszt całkowity

Brak danych

INSTITUT D'ESTUDIS CATALANS

Hiszpania

Wkład UE

Brak danych

Adres

47,CARRER DEL CARME
08001 BARCELONA

Koszt całkowity

Brak danych

INSTITUT NATIONAL DE LA LANGUE FRANCAISE

Francja

Wkład UE

Brak danych

Adres

AVENUE DE LA GRILLE D'HONNEUR, LE PARC
92211 SAINT-CLOUD CEDEX

Koszt całkowity

Brak danych

INSTITUUT VOOR NEDERLANDSE LEXICOLOGIE

Niderlandy

Wkład UE

Brak danych

Adres

2-3,MATTHIAS DE VRIESHOF
2311 BZ LEIDEN

Koszt całkowity

Brak danych

Institut für Deutsche Sprache

Niemcy

Wkład UE

Brak danych

Adres

68016 Mannheim

Koszt całkowity

Brak danych

Institute for Language and Speech Processing (ILSP)

Grecja

Wkład UE

Brak danych

Adres

22,Margari Street
11525 Athens

Koszt całkowity

Brak danych

UNIVERSITY OF BIRMINGHAM

Zjednoczone Królestwo

Wkład UE

Brak danych

UNIVERSITY OF HELISINKI

Finlandia

Wkład UE

Brak danych

Adres

8,KESKUSKATU
00014 HELISINKI

Koszt całkowity

Brak danych

UNIVERSITY OF LIEGE

Belgia

Wkład UE

Brak danych

Adres

7,PLACE DU XX AOUT
4000 LIEGE

Koszt całkowity

Brak danych

LANGUAGE ENGINEERING - PREPARATORY ACTION FOR LINGUISTIC RESOURCES ORGANIZATION FOR LANGUAGE ENGINEERING

Cel

Program(-y) Wieloletnie programy finansowania, które określają priorytety Unii Europejskiej w obszarach badań naukowych i innowacji.

Temat(-y) Zaproszenia do składania wniosków dzielą się na tematy. Każdy temat określa wybrany obszar lub wybrane zagadnienie, których powinny dotyczyć wnioski składane przez wnioskodawców. Opis tematu obejmuje jego szczegółowy zakres i oczekiwane oddziaływanie finansowanego projektu.

Zaproszenie do składania wniosków Procedura zapraszania wnioskodawców do składania wniosków projektowych w celu uzyskania finansowania ze środków Unii Europejskiej.

Koordynator

Uczestnicy (14)

Pobierz Pobierz zawartość strony

Program(-y)

Wieloletnie programy finansowania, które określają priorytety Unii Europejskiej w obszarach badań naukowych i innowacji.

Temat(-y)

Zaproszenia do składania wniosków dzielą się na tematy. Każdy temat określa wybrany obszar lub wybrane zagadnienie, których powinny dotyczyć wnioski składane przez wnioskodawców. Opis tematu obejmuje jego szczegółowy zakres i oczekiwane oddziaływanie finansowanego projektu.

Zaproszenie do składania wniosków

Procedura zapraszania wnioskodawców do składania wniosków projektowych w celu uzyskania finansowania ze środków Unii Europejskiej.