CORDIS Simple search


Choose your language:
en

Contextual menu:

Path:


Menu
  :: About Projects  


Multi-language pronunciation dictionary of proper names and place names

Start date:1993-01-01

End date:1995-01-01

Project Acronym:ONOMASTICA

Project status:Completed

Coordinator

Organization name:University of Edinburgh
Administrative contact Address
Name:M A JACK  Centre for Speech Technology Research
80 South Bridge
EH1 1HN
Edinburgh
UNITED KINGDOM

Region:SCOTLAND BORDERS-CENTRAL-FIFE-LOTHIAN-TAYSIDE Lothian
Tel:+44-31-6502783
Fax:+44-31-2262730
E-mail:
URL: Organization Type:

Description


Objective: This project seeks to create a set of pronunciation lexicons of European names (city and town names, street names, family names, company names, product names) in a machine-assisted fashion whereby expert linguists (phoneticians/lexicographers) will carry out editorial preparation of the lexicons using customised workstation software.

The objective of the project is to make available, for widescale exploitation, quality controlled pronunciation lexicons in machine readable form (CD-ROM) for use in automated language systems, of primary interest to European companies in the telecommunications sector and in the European (dictionary) publishing industry.

An important sub-goal of the project will be the preparation of a set of letter-to-sound rules specific to the names of each language. The growing pronunciation lexicon will be used to extend the rule set and will also be used to train self-learning software.

A total of nine current languages of the European Community will be addressed on the project: Danish, Dutch, English, French, German, Greek, Italian, Portuguese and Spanish. The aim over the 2-year project is to derive pronunciation dictionaries for up to 1000000 names per language and to investigate the problems of exchanging national names amongst the partners to create a matrix of 'native-ised' pronunciations for each (thereby) foreign name in each other language.

Achievements:

General information: The approach in the project is to define a dictionary consisting of names and their phonetic representation. To create this dictionary a lexicographer will initially select from a names list (provided by the relevant industrial particpant - see below), the most frequently used 20000 to 50000 names for the language. The dictionary would then be generated directly by hand, by an expert phonetician transcribing the conventional pronunciations of the names.

On the basis of the lexicon represented by this initial sample of 20000-50000 names, it will be possible to write an initial set of grapheme-to-phoneme conversion rules, and evaluate these rules in preparation for (semi)-automation of the further lexicographic work. An alternative preprocessor based on neural computing methods will also be investigated. Definition of the conversion standards and linguistic systems to be used for grapheme-to-phoneme conversion in each language, to ensure compatibility, will be one of the early tasks for the project.

The primary deliverables from this project will be a set of multi-language, machine readable CD-ROM pronunciation dictionaries for European city and town names, street names, family names, company names and product names. These dictionaries could be made available to the European Language Industry for commercial exploitation on a royalty basis.

The results from the project in the form of machine readable lexicons prescribing the pronunciation of names will constitute a valuable linguistic resource which allows natural language products to handle names correctly. Benefits will be felt in systems such as future map information systems which can recognize and synthesize names accurately; future automated directory enquiry systems such as future map information systems which can recognize and synthesize names accurately; future automated directory enquiry systems which can provide telephone numbers using advanced machine dialogues, recognising the desired name and address; and systems such as talking newspapers and books (for the blind) which can accommodate occurrences of names without pronunciation errors.

The project has been designed to be operated by a group of nine academic partners with nine Associated Partners from the European telecommunications industry. In themselves this group represents a major user group for potential downstream exploitation of the results of the project.

Project Details


Start date:1993-01-01

End date:1995-01-01

Duration:24 months

Project Reference:LRE61004

Project cost:

Project Funding:

Programme Acronym: LRE

Programme type:Third Framework Programme

Subprogramme Area:

Contract type:No contract type

URL:

Subject index:Information Processing, Information Systems
 

Results for this Project

ONOMASTICA: multi-language pronunciation dictionary of proper names  21/09/1998 

Other participants

Organization name:Università degli Studi di Pisa
Administrative contact Address
Name:CALZOLARI  Dipartimento di Linguistica Computazionale
Via della Faggiola 32
56100
Pisa
ITALIA

Region:CENTRO (I) TOSCANA Pisa
Tel:+39-50-560481
Fax:+39-50-589055
E-mail:
URL: Organization Type:
 
Organization name:University Politecnica de Madrid
Administrative contact Address
Name:J M PARDO  Ciudad Universitaria
28040
Madrid
ESPAÑA

Region:COMUNIDAD DE MADRID
Tel:+34-1-5437597
Fax:+34-1-3367323
E-mail:
URL: Organization Type:
 
Organization name:Inst Nac de Eng de Sistemas Speech Comput
Administrative contact Address
Name:TRANCOSCO  9 Rus Alves Redol
1000
Lisbon
PORTUGAL

Region:CONTINENTE LISBOA E VALE DO TEJO Grande Lisboa
Tel:+351-1-3155150
Fax:+351-1-525843
E-mail:
URL: Organization Type:
 
Organization name:Department of Electrical Engineering, University Patras
Administrative contact Address
Name:KOKKINAKIS  Kato Kastritsi
26500
Patras
HELLAS

Region:KENTRIKI ELLADA DYTIKI ELLADA Achaia
Tel:+30-2610-991722
Fax:+30-2610-991855
E-mail:
URL: Organization Type:
 
Organization name:Technische Universität Berlin
Administrative contact Address
Name:Klaus FELLBAUM (Professor Dr-Ing) Institut für Fermeldetechnik
Einsteinufer 25
10587
Berlin
DEUTSCHLAND

Region:BERLIN
Tel:+49-30-31425209
Fax:+49-30-31422514
E-mail:
URL: Organization Type:
 
Organization name:Katholieke Universiteit Nijmegen
Administrative contact Address
Name:KERKMAN  1 Wundtlaan
6525 XD
Nijmegen
NEDERLAND

Region:OOST-NEDERLAND GELDERLAND Arnhem/Nijmegen
Tel:+31-80-612117
Fax:+31-80-521213
E-mail:
URL: Organization Type:
 
Organization name:École Nationale Supérieure des Télécommunications
Administrative contact Address
Name:BONNET  46 rue Barrault
45634
Paris
FRANCE

Region:ÎLE DE FRANCE Ile de France Paris
Tel:+33-1-45817646
Fax:+33-1-45813119
E-mail:
URL: Organization Type:
 
Organization name:Speech Technology Centre, Aalborg University
Administrative contact Address
Name:DALSGAARD  Frederik Bajers Vej 7
9220
Aalborg
DANMARK

Region:DANMARK Danmark Nordjyllands amt
Tel:+45-98-158522
Fax:+45-98-156740
E-mail:
URL: Organization Type:
 

Record control number:17213




CORDIS is managed by the Publications Office