Building on the results of the EuroWordNet project that developed wordnets for West European languages, the BalkaNet IST project extends the wordnet approach to the less studied Balkan languages in an effort to strengthen ties between the academic and research communities in the region and with others elsewhere in Europe.
First developed for English by Princeton University in the United States, wordnets consist of a semantic lexicon that groups words into sets of synonyms called synsets, providing short definitions and recording the various semantic relations between words. The result is a combination of a multilingual dictionary, thesaurus and translation tool, which can be employed, as in the case of BalkaNet, for a conceptual rather than word-specific Internet search engine. BalkaNet incorporates Bulgarian, Greek, Romanian, Serbian and Turkish, as well as extending a Czech wordnet previously developed by the EuroWordNet project.
“A researcher who is unsure of what keywords to use in a search could, for example, use this to find additional keywords related to the information he is looking for because the system links words to a concept in any of the languages we have incorporated,” explains project manager Sofia Stamou.
“It could be used by a Greek who needs to find synonyms in Bulgarian or Turkish, something that is particularly useful for people in cross-border areas or researchers working in different countries,” adds BalkaNet coordinator Dimitrios Christodoulakis.
The two project leaders at the University of Patras in Greece note that the wordnet approach allows researchers to use their own words when carrying out a search, rather than being tied to the specific wordings and rationale of electronic databases that would only produce a match if the right keywords are used.
In the case of Balkanet, the project partners had to overcome the problem of a lack of existing resources for Balkan languages, especially digitalised ones, and in some instances had to produce their own lexicons. A pilot application was used to test the quality of the translation and the completeness of the system.
“We used it to align the wording and annotate George Orwell’s book 1984 across the six languages,” Stamou says.
The partners are currently using the system themselves and are planning to make it available to the broader research community in the near future.
Contact:
Dimitrios Christodoulakis
Database Laboratory
Computer Engineering and Informatics Department
University Of Patras
GR-26500 Patras
Greece
Tel: +30-2610960385
Fax: +30-2610960438
Email: dxri@cti.gr
Source: Based on information from Balkanet