Multi-language pronunciation dictionary of proper names and place names

Información del proyecto

ONOMASTICA

Identificador del acuerdo de subvención: LRE61004

Proyecto cerrado

Fecha de inicio 1 Enero 1993

Fecha de finalización 1 Enero 1995

Financiado con arreglo a

Specific programme of research and technological development (EEC) in the field of telematic systems in areas of general interest - Linguistic research and engineering -, 1990-1994

Coste total

Sin datos

Aportación de la UE

Sin datos

Coordinado por

University of Edinburgh
United Kingdom

Objetivo

This project seeks to create a set of pronunciation lexicons of European names (city and town names, street names, family names, company names, product names) in a machine-assisted fashion whereby expert linguists (phoneticians/lexicographers) will carry out editorial preparation of the lexicons using customised workstation software.

The objective of the project is to make available, for widescale exploitation, quality controlled pronunciation lexicons in machine readable form (CD-ROM) for use in automated language systems, of primary interest to European companies in the telecommunications sector and in the European (dictionary) publishing industry.

An important sub-goal of the project will be the preparation of a set of letter-to-sound rules specific to the names of each language. The growing pronunciation lexicon will be used to extend the rule set and will also be used to train self-learning software.

A total of nine current languages of the European Community will be addressed on the project: Danish, Dutch, English, French, German, Greek, Italian, Portuguese and Spanish. The aim over the 2-year project is to derive pronunciation dictionaries for up to 1000000 names per language and to investigate the problems of exchanging national names amongst the partners to create a matrix of 'native-ised' pronunciations for each (thereby) foreign name in each other language.

The approach in the project is to define a dictionary consisting of names and their phonetic representation. To create this dictionary a lexicographer will initially select from a names list (provided by the relevant industrial particpant - see below), the most frequently used 20000 to 50000 names for the language. The dictionary would then be generated directly by hand, by an expert phonetician transcribing the conventional pronunciations of the names.

On the basis of the lexicon represented by this initial sample of 20000-50000 names, it will be possible to write an initial set of grapheme-to-phoneme conversion rules, and evaluate these rules in preparation for (semi)-automation of the further lexicographic work. An alternative preprocessor based on neural computing methods will also be investigated. Definition of the conversion standards and linguistic systems to be used for grapheme-to-phoneme conversion in each language, to ensure compatibility, will be one of the early tasks for the project.

The primary deliverables from this project will be a set of multi-language, machine readable CD-ROM pronunciation dictionaries for European city and town names, street names, family names, company names and product names. These dictionaries could be made available to the European Language Industry for commercial exploitation on a royalty basis.

The results from the project in the form of machine readable lexicons prescribing the pronunciation of names will constitute a valuable linguistic resource which allows natural language products to handle names correctly. Benefits will be felt in systems such as future map information systems which can recognize and synthesize names accurately; future automated directory enquiry systems such as future map information systems which can recognize and synthesize names accurately; future automated directory enquiry systems which can provide telephone numbers using advanced machine dialogues, recognising the desired name and address; and systems such as talking newspapers and books (for the blind) which can accommodate occurrences of names without pronunciation errors.

The project has been designed to be operated by a group of nine academic partners with nine Associated Partners from the European telecommunications industry. In themselves this group represents a major user group for potential downstream exploitation of the results of the project.

Ámbito científico (EuroSciVoc)

CORDIS clasifica los proyectos con EuroSciVoc, una taxonomía plurilingüe de ámbitos científicos, mediante un proceso semiautomático basado en técnicas de procesamiento del lenguaje natural. Véas: El vocabulario científico europeo..

Programa(s)

Programas de financiación plurianuales que definen las prioridades de la UE en materia de investigación e innovación.

FP3-LRE - Specific programme of research and technological development (EEC) in the field of telematic systems in areas of general interest - Linguistic research and engineering -, 1990-1994

Tema(s)

Las convocatorias de propuestas se dividen en temas. Un tema define una materia o área específica para la que los solicitantes pueden presentar propuestas. La descripción de un tema comprende su alcance específico y la repercusión prevista del proyecto financiado.

Datos no disponibles

Convocatoria de propuestas

Procedimiento para invitar a los solicitantes a presentar propuestas de proyectos con el objetivo de obtener financiación de la UE.

Datos no disponibles

Régimen de financiación

Régimen de financiación (o «Tipo de acción») dentro de un programa con características comunes. Especifica: el alcance de lo que se financia; el porcentaje de reembolso; los criterios específicos de evaluación para optar a la financiación; y el uso de formas simplificadas de costes como los importes a tanto alzado.

Datos no disponibles

Coordinador

University of Edinburgh

Aportación de la UE

Sin datos

Dirección

80 South Bridge
EH1 1HN Edinburgh
Reino Unido

Coste total

Sin datos

Participantes (8)

Department of Electrical Engineering, University Patras

Grecia

Aportación de la UE

Sin datos

Dirección

Kato Kastritsi
26500 Patras

Coste total

Sin datos

Inst Nac de Eng de Sistemas Speech Comput

Portugal

Aportación de la UE

Sin datos

Dirección

9 Rus Alves Redol
1000 Lisbon

Coste total

Sin datos

Katholieke Universiteit Nijmegen

Países Bajos

Aportación de la UE

Sin datos

Dirección

1 Wundtlaan
6525 XD Nijmegen

Coste total

Sin datos

Speech Technology Centre, Aalborg University

Dinamarca

Aportación de la UE

Sin datos

Dirección

Frederik Bajers Vej 7
9220 Aalborg

Coste total

Sin datos

Technische Universität Berlin

Alemania

Aportación de la UE

Sin datos

Dirección

Einsteinufer 25
10587 Berlin

Coste total

Sin datos

University Politecnica de Madrid

España

Aportación de la UE

Sin datos

Dirección

Ciudad Universitaria
28040 Madrid

Coste total

Sin datos

Università degli Studi di Pisa

Italia

Aportación de la UE

Sin datos

Dirección

Via della Faggiola 32
56100 Pisa

Coste total

Sin datos

École Nationale Supérieure des Télécommunications

Francia

Aportación de la UE

Sin datos

Dirección

46 rue Barrault
45634 Paris

Coste total

Sin datos

Objetivo

Ámbito científico (EuroSciVoc) CORDIS clasifica los proyectos con EuroSciVoc, una taxonomía plurilingüe de ámbitos científicos, mediante un proceso semiautomático basado en técnicas de procesamiento del lenguaje natural. Véas: El vocabulario científico europeo..

Programa(s) Programas de financiación plurianuales que definen las prioridades de la UE en materia de investigación e innovación.

Tema(s) Las convocatorias de propuestas se dividen en temas. Un tema define una materia o área específica para la que los solicitantes pueden presentar propuestas. La descripción de un tema comprende su alcance específico y la repercusión prevista del proyecto financiado.

Convocatoria de propuestas Procedimiento para invitar a los solicitantes a presentar propuestas de proyectos con el objetivo de obtener financiación de la UE.

Coordinador

Participantes (8)

Descargar Descargar el contenido de la página

Ámbito científico (EuroSciVoc)

CORDIS clasifica los proyectos con EuroSciVoc, una taxonomía plurilingüe de ámbitos científicos, mediante un proceso semiautomático basado en técnicas de procesamiento del lenguaje natural. Véas: El vocabulario científico europeo..

Programa(s)

Programas de financiación plurianuales que definen las prioridades de la UE en materia de investigación e innovación.

Tema(s)

Las convocatorias de propuestas se dividen en temas. Un tema define una materia o área específica para la que los solicitantes pueden presentar propuestas. La descripción de un tema comprende su alcance específico y la repercusión prevista del proyecto financiado.

Convocatoria de propuestas

Procedimiento para invitar a los solicitantes a presentar propuestas de proyectos con el objetivo de obtener financiación de la UE.