Skip to main content
European Commission logo print header

Multilingual Joint Word Sense Disambiguation

Final Report Summary - MULTIJEDI (Multilingual Joint Word Sense Disambiguation)

What does it mean? - how computers can improve understanding among the babel of human languages.

When faced with a written word, sentence, paragraph or longer piece of text in their own language, literate humans rarely encounter difficulties handling the linguistic components required for interpreting the meaning. Despite the inherent complexity of the process, most of the time they succeed in understanding textual content without even realising how they are accomplishing it.
For example, let’s consider the following sentence: Spring water can be found at different altitudes.
A human reader will immediately identify the meaning of spring as being ‘a natural flow of ground water’, rather than the season “spring”, and will also grasp the correct senses of water (i.e. the common liquid sense as opposed to, e.g. the “body of water” sense) and of altitude (i.e. the geographical, not the geometrical, sense). In fact, in holding the various possible senses of words in mind and selecting the one correct sense for a given context, humans perform - continuously and effortlessly - a process of semantic disambiguation.
However, while this process is painless and straightforward for humans, it is drastically harder for machines such as computers. Automating this task, called Word Sense Disambiguation (WSD), requires computationally determining the meaning of words in context - and for many years now this has been considered one of the hardest problems in Artificial Intelligence.
The MultiJEDI (Multilingual Joint word sensE DIsambiguation) project’s approach to the WSD problem is based on a key insight: that multilinguality - i.e. items of knowledge that are expressed in many different languages - represents a fundamental resource which can be “leveraged”, or drawn upon, to act as a powerful catalyst in tackling the disambiguation task. The distinctive feature of the algorithms undergoing development within the project is their ability to leverage the wide variety of cultural and linguistic knowledge originating from all over the world as a strength for resolving the ambiguity of words in context in any arbitrary language of focus.
A key intermediate objective of the project has been the creation of BabelNet, a combined large multilingual “encyclopedic dictionary” (usable by both humans and machines) and semantic network (knowledge readily accessible only to machines) automatically constructed from existing Web resources such as Wikipedia. BabelNet now covers 271 languages and is available online at http://babelnet.org with continuous work to create a richer and more complex network of concepts and entities.
The creation of a multilingual dictionary in electronic format is a trailblazing step towards the automatic disambiguation of text in any arbitrary language of focus. BabelNet has in fact enabled he development of novel, effective methods for joint multilingual word sense disambiguation thereby achieving outcomes that are not dissimilar to those of human reasoning). The system, called Babelfy (http://babelfy.org) is able to interpret and understand text written in any language.
Our high-performance results are the acknowledgement that MultiJEDI is paving the way for computers to facilitate improved understanding among the babel of human languages.
Right after the end of the project, the EU Commission, EU Parliament and EU Publications Office organized a workshop on the outcomes of the project and attended by more than one hundred company and university representatives (http://babelnet.org/lux) showing great promise for important developments enabled both in the academia and in the industry.