Service Communautaire d'Information sur la Recherche et le Développement - CORDIS

Multilanguage pronunciation dictionary on CD-ROM of names in Central and Eastern Europe

A project has been set up to prepare a machine based pronunciation lexicon of 250 000 Czech names. The names were taken from the central register and comprise all village, city and town names, first and last names. Rules were generate, later tested and continually improved. The pronunciation machine is 100% accurate, in the case of village, town and city names. Street names appeared more complex due to some street names being of foreign origin with variable pronunciation, numbers appearing in street names and also abbreviations. First names and last names were dealt with, with particular attention paid to:
foreign names and variable spelling and pronunciation;
use of no diacritics in the names of Czechs abroad;
level of transliteration of Russian, Ukrainian, and other names;
irregularities in spelling (especially the mixture of Polish and Hungarian - east Slovakian names and Gypsies). The lexicon provides user with the varieties in the pronunciation of names used by Czech speakers. It can be used both for speech production and speech recognition by telecom companies. It can also help people in civil administration and business to understand local speakers and to deliver fully intelligible information. If the user does not get beyond the set of Czech names he or she can fully rely on the automatic machine transcription which is almost 100% accurate. If they have to use a foreign name they can rely on different varieties of either machine transcription which complies with the Czech rules of pronunciation or can utilize a more or less approximate Czech version.

Reported by

Slezská Univerzita
Bezrucovo Nam 13
74601 Opava
Czech Republic
See on map