Multi-modal Context Modelling for Machine Translation

Project Information

MultiMT

Grant agreement ID: 678017

DOI

10.3030/678017

Project closed

EC signature date 10 March 2016

Start date 1 July 2016

End date 31 December 2021

Funded under

EXCELLENT SCIENCE - European Research Council (ERC)

Total cost

€ 1 493 771,00

EU contribution

€ 1 493 771,00

1 493 771,00

Coordinated by

IMPERIAL COLLEGE OF SCIENCE TECHNOLOGY AND MEDICINE
United Kingdom

Project description

A new era in machine translation

In the field of Natural Language Processing (NLP), the goal of automatically translating human language has long been pursued. However, current approaches, such as Statistical Machine Translation (SMT), often overlook vital contextual cues present in human translations. This leads to translations that lack relevant information or convey incorrect meanings, hampering reading comprehension and rendering them useless in many cases. In this context, the ERC-funded MultiMT project is taking an innovative approach by harnessing global multi-modal information. It will develop methods to incorporate contextual cues such as images, related documents, and metadata into translation models. Twitter posts and product reviews will serve as test datasets. This interdisciplinary initiative combines expertise from NLP, Computer Vision, and Machine Learning.

Objective

Automatically translating human language has been a long sought-after goal in the field of Natural Language Processing (NLP). Machine Translation (MT) can significantly lower communication barriers, with enormous potential for positive social and economic impact. The dominant paradigm is Statistical Machine Translation (SMT), which learns to translate from human-translated examples.

Human translators have access to a number of contextual cues beyond the actual segment to translate when performing translation, for example images associated with the text and related documents. SMT systems, however, completely disregard any form of non-textual context and make little or no reference to wider surrounding textual content. This results in translations that miss relevant information or convey incorrect meaning. Such issues drastically affect reading comprehension and may make translations useless. This is especially critical for user-generated content such as social media posts -- which are often short and contain non-standard language -- but applies to a wide range of text types.

The novel and ambitious idea in this proposal is to devise methods and algorithms to exploit global multi-modal information for context modelling in SMT. This will require a significantly disruptive approach with new ways to acquire multilingual multi-modal representations, and new machine learning and inference algorithms that can process rich context models. The focus will be on three context types: global textual content from the document and related texts, visual cues from images and metadata including topic, date, author, source. As test beds, two challenging user-generated datasets will be used: Twitter posts and product reviews.

This highly interdisciplinary research proposal draws expertise from NLP, Computer Vision and Machine Learning and claims that appropriate modelling of multi-modal context is key to achieve a new breakthrough in SMT, regardless of language pair and text type.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

H2020-EU.1.1. - EXCELLENT SCIENCE - European Research Council (ERC) MAIN PROGRAMME
See all projects funded under this programme

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

ERC-StG-2015 - ERC Starting Grant
See all projects funded under this topic

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

ERC-STG - Starting Grant

See all projects funded under this funding scheme

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

(opens in new window) ERC-2015-STG

See all projects funded under this call

Host institution

IMPERIAL COLLEGE OF SCIENCE TECHNOLOGY AND MEDICINE

Net EU contribution

€ 1 010 513,67

Address

SOUTH KENSINGTON CAMPUS EXHIBITION ROAD
SW7 2AZ London
United Kingdom

Region

London Inner London — West Westminster

Activity type

Higher or Secondary Education Establishments

Links

Contact the organisation

Website

Participation in EU R&I programmes

HORIZON collaboration network

Total cost

€ 1 010 513,67

Beneficiaries (2)

IMPERIAL COLLEGE OF SCIENCE TECHNOLOGY AND MEDICINE

United Kingdom

Net EU contribution

€ 1 010 513,67

THE UNIVERSITY OF SHEFFIELD

United Kingdom

Net EU contribution

€ 483 257,33

Project description

A new era in machine translation

Objective

Fields of science (EuroSciVoc) CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s) Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s) Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Funding Scheme Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Call for proposal Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Host institution

Beneficiaries (2)

Download Download the content of the page

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.