Lifelong UNiversal lAnguage Representation

Descrizione del progetto

Algoritmi di apprendimento profondo per migliorare la traduzione automatica

La traduzione automatica è una traduzione automatizzata svolta dal computer senza il coinvolgimento di risorse umane. Nonostante i progressi tecnologici e la natura decisamente multilingue del nostro pianeta, la tecnologia vocale e linguistica non ha tenuto il passo con le esigenze in tutte le lingue. il progetto LUNAR, finanziato dall’UE, svilupperà un modello multilingue e multimodale che si basa su una rappresentazione linguistica universale permanente. Questo modello compenserà la mancanza di dati supervisionati e accrescerà in modo significativo la capacità di generalizzazione del sistema. Ridurrà inoltre il numero dei sistemi di traduzione necessari da quadratico a lineare e consentirà l’adattamento incrementale di lingue e dati inediti.

Obiettivo

Why is machine translation between English and Portuguese significantly better than machine translation between Dutch and Spanish? Why do speech recognizers work better in German than Finnish? The main problem is the insufficient amount of labelled data for training in both cases. Although the world is multimodal and highly multilingual, speech and language technology is not keeping up with the demand in all languages. We need better learning methods that exploit the advancements of a few modalities and languages for the benefit of others. This proposal addresses the low-resources problem and the expensive approach to multilingual machine translation since systems for all translation pairs are required.
LUNAR proposes to jointly learn a multilingual and multimodal model that builds upon a lifelong universal language representation. This model will compensate for the lack of supervised data and significantly increase the system capacity of generalization from training data given the unconventional variety of employed resources. This model will reduce the number of required translation systems from quadratic to linear as well as allowing for an incremental adaptation of new languages and data.
The high-risk/high-gain relies on automatically training a universal language representation by specifically designed deep learning algorithms. LUNAR will employ an encoder-decoder architecture. The encoder represents an abstraction of the input by reducing its dimensionality,which will become the proposed universal language representation; from this abstraction, the decoder generates the output. The encoder-decoder internal architecture will be designed for learning the universal language representation,which will be appropriately integrated as an objective of the architecture.
LUNAR will impact multidisciplinary communities of specialists in computer science, mathematics, engineering and linguistics who work on natural language understanding and speech processing applications.

Campo scientifico

Parole chiave

Meccanismo di finanziamento

ERC-STG - Starting Grant

Istituzione ospitante

UNIVERSITAT POLITECNICA DE CATALUNYA

Contribution nette de l'UE

€ 1 498 723,00

Indirizzo

CALLE JORDI GIRONA 31
08034 Barcelona
Spagna

Regione

Este Cataluña Barcelona

Tipo di attività

Higher or Secondary Education Establishments

Collegamenti

Contatta l’organizzazione Sito web

Partecipazione a programmi di R&I dell'UE

Rete di collaborazione HORIZON

Costo totale

€ 1 498 723,00

Beneficiari (1)

UNIVERSITAT POLITECNICA DE CATALUNYA

Spagna

Contribution nette de l'UE

€ 1 498 723,00

Descrizione del progetto

Algoritmi di apprendimento profondo per migliorare la traduzione automatica

Obiettivo

Campo scientifico

Parole chiave

Programma(i)

Argomento(i)

Invito a presentare proposte

Meccanismo di finanziamento

Istituzione ospitante

Beneficiari (1)

Condividi questa pagina

Scarica