In the information society the language barrier represents one of the main obstacles to the automatic use, integration and manipulation of knowledge, and this is manifested in the lack of intelligent systems able to perform unified semantic processing of textual resources in a multitude of different languages. To create such systems, a necessary step is to assign the appropriate meanings to the words in documents, a task referred to as Word Sense Disambiguation (WSD). But while WSD is typically performed in a monolingual setting, in order to enable multilingual processing, the semantic connections between word senses (i.e. meanings) in different languages need to be exploited. However, current state-of-the-art systems mainly rely on the existence of bilingual aligned text collections or limited-coverage multilingual resources to perform cross-lingual disambiguation, an unrealistic requirement when working with an arbitrary number of language pairs.
Here we propose a research program that will investigate radically new directions for performing multilingual WSD. The key intuition underlying our proposal is that WSD can be performed globally to exploit at the same time knowledge available in many languages. The first stage will involve the development of a methodology for automatically creating a large-scale, multilingual knowledge base. In a second stage, using this lexical resource, novel graph-based algorithms for jointly performing disambiguation across different languages will be designed and experimented. Crucially, we aim to show that these two tasks are mutually beneficial for going beyond current state-of-the-art WSD systems. The proposed project will have an impact not only on WSD research, but also on related areas such as Information Retrieval and Machine Translation.
Call for proposal
See other projects for this call