Skip to main content
European Commission logo print header
Zawartość zarchiwizowana w dniu 2022-12-23

GLOSSER

Cel

Glosser aims to provide to computerized assistance to intermediate-level language learners.

A substantial barrier to the free flow of ideas and technologies is the linguistic barrier: ideas expressed in a particular language are only accessible to its speakers. Since ever increasing numbers of people encounter texts electronically, automated methods of language processing may be brought to bear to alleviate this problem. Specifically, assistance may take the form of (i) access to bilingual dictionaries; (ii) analysis of grammatical content of morphological information; (iii) access to similar examples in (bilingual) corpora.

For example,we imagine speakers of Estonian, Bulgarian or Hungarian, intermediate language learners/users of English, reading a software manual on the screen: upon encountering an unknown word or an unfamiliar use of known word, e.g "reverts" as in: "This action REVERTS the buffer to the form stored on disk" the user should be able to mouse it to invoke on-line help (follow a hyperlink which is constructed dynamically). The help facility should be prepared to provide: (i) the entry to the word 'revert' (note the morphological 's' must be stripped) in a bilingual Estonian/English, etc. dictionary; (ii) an indication of the morphological content of the word form ('s') indicates that it's 3rd person singular; and (iii) an invitation to seek out other examples of the word in on-line corpora (ideally in bilingual corpora).

A modest prototype demonstrating the capabilities above will be built using rule-based morphological analysis. To obviate the danger that purely morphological methods will discriminate too little in choosing relevant lemmata and word senses, we will employ tagging systems (either rule-based or stochastic) for part-of-speech and word-sense identification. The project will build this prototype while exploring further areas which will require extensions and elaborations:

1) phonic information, including pointers to digital sound recordings in the dictionary entry.
2) constructing useful bilingual correspondences (rough dictionary equivalents) automatically through processing of bilingual corpora.
3) providing entries in a way sensitive to linguistic context--especially for treating compounds ("scroll bar position") and other multi-word lexemes ("keep track of").

Temat(-y)

Data not available

Zaproszenie do składania wniosków

Data not available

System finansowania

CSC - Cost-sharing contracts

Koordynator

Rijksuniversiteit Groningen
Wkład UE
Brak danych
Adres
Oude Kijk in 't Jatstraat 26
9700 As Groningen
Niderlandy

Zobacz na mapie

Koszt całkowity
Brak danych

Uczestnicy (4)