Periodic Reporting for period 2 - LUNAR (Lifelong UNiversal lAnguage Representation)
Período documentado: 2022-02-01 hasta 2023-11-30
[Relevance in society] Although the world is multimodal and highly multilingual, speech and language technology are not keeping up with the demand in all languages. We need better learning methods that exploit the advancements of a few modalities and languages for the benefit of others.
[Objectives] This proposal addresses the low-resources problem and the expensive approach to multilingual machine translation since systems for all translation pairs are required. LUNAR proposes to jointly learn a multilingual and multimodal model that builds upon a lifelong universal language representation. This model will compensate the lack of supervised data and significantly increase the system capacity of generalization from training data given the unconventional variety of employed resources. This model will reduce the number of required translation systems from quadratic to linear as well as allowing for an incremental adaptation of new languages and data.
[Methodology] The high-risk/high-gain relies on automatically training a universal language by specifically designed deep learning algorithms. LUNAR will employ an encoder-decoder architecture. The encoder represents an abstraction of an input by reducing its dimensionality, which will become the proposed universal language; from this abstraction, the decoder generates the output. The encoder-decoder internal architecture will be explicitly designed for learning the universal language, which will be appropriately integrated as an objective of the architecture.
[General impact] LUNAR will impact highly multidisciplinary communities of specialists in computer science, mathematics, engineering and linguistics who work on natural language understanding, natural language and speech processing applications.
State-of-the-art multilingual machine translation relies on a universal encoder-decoder. This architecture allows knowledge transfer but it forces a dependency between languages. This dependency requires retraining the entire system to add new languages.
In (Escolano, Costa-jussà et al., 2021a), we propose an alternative approach that is based on language-specific encoder-decoders, and can thus be more easily extended to new languages by learning their corresponding modules. So as to encourage a common interlingua representation, we simultaneously train the N initial languages in joint training. Our experiments show that the proposed approach outperforms the universal encoder-decoder by a significant amount, while allowing to add new languages without the need to retrain the rest of the modules by incremental training. All in all, our work closes the gap between shared and language-specific encoder/decoders, advancing toward modular multilingual machine translation systems that can be flexibly extended in lifelong learning settings.
While multilingual machine translation approaches rely on higher-quality and more massive data sets, current end-to-end approaches to spoken language translation rely on limited training resources, especially for multilingual settings. At this point, our method, presented in (Escolano, Costa-jussà et al, 2021b), extends our multilingual machine translation architecture based on language-specific encoders-decoders to the task of multilingual spoken language translation. Our method entirely eliminates the dependency from multilingual spoken language translation data and it is able to translate while training only on automatic speech recognition and multilingual machine translation data. Our experiments on four different languages show that coupling the speech encoder to the multilingual machine translation architecture produces similar quality translations compared to a bilingual baseline while effectively allowing for zero-shot multilingual spoken language translation. Additionally, we propose using an Adapter module for coupling the speech inputs. This Adapter module produces consistent improvements.
By-product contributions of the main research direction include a survey of the continual lifelong learning natural language processing (Biesialska et al., 2020) and participation in evaluation campaigns of machine translation (Escolano et al., 2021) and speech translation (Gállego et al., 2021) as well as organization of such (Barrault et al., 2021). More than this, progress has been made in the area of interpretability (Ferrando & Costa-jussà, 2021) and integration of linguistic knowledge in the neural systems (Armengol & Costa-jussà, 2021, Armengol, Costa-jussà et al, 2021). Finally, we have done progress in the direction of mitigating gender bias in machine translation (Costa-jussà and de Jorge, 2020) and documentation of such (Hardmeier et al., 2021). In this direction, we show that multilingual machine translation architectures that differ in the amount of sharing modules and parameters among languages influence the amount of gender bias accuracy even if trained with the same data sets (Costa-jussà et al, 2020). Experiments in four language pairs show that our proposed language-specific encoders-decoders exhibit less bias than the universal encoder-decoder architecture. Further interpretability analysis of source embeddings and the attention shows that, in the language-specific case, the embeddings encode more gender information, and its attention is more diverted. Both behaviors help in mitigating gender bias.
The overview of exploitation and dissemination.
The project exploitation culminates on our US Patent Proposal, Multilingual Translator (Costa-jussà et al., 2021). Dissemination has been done across a wide variety of media and institutions. Among which we include the Asociación española para el avance de la ciencia, Ajuntament de Barcelona, Generalitat de Catalunya, Fundació “la Caixa”, International Interpreting Services, Praire, CSIC, UPC, Govern Obert.