Deep Learning for Structured Prediction in Natural Language Processing

Periodic Reporting for period 4 - DeepSPIN (Deep Learning for Structured Prediction in Natural Language Processing)

Periodo di rendicontazione: 2022-08-01 al 2023-07-31

Deep learning is revolutionizing the field of Natural Language Processing (NLP), with breakthroughs in machine translation, speech recognition, and question answering. New language interfaces (digital assistants, messenger apps, customer service bots) are emerging as the next technologies for seamless, multilingual communication among humans and machines.

From a machine learning perspective, many problems in NLP can be characterized as structured prediction: they involve predicting structurally rich and interdependent output variables. In spite of this, current neural NLP systems ignore the structural complexity of human language, relying on simplistic and error-prone greedy search procedures. This leads to serious mistakes in machine translation, such as words being dropped or named entities mistranslated. More broadly, neural networks are missing the key structural mechanisms for solving complex real-world tasks requiring deep reasoning.

This project has attacked these fundamental problems by bringing together deep learning and structured prediction, with a highly disruptive and cross-disciplinary approach. First, we endowed neural networks with a planning mechanism to guide structural search, letting decoders learn the optimal order by which they should operate. This makes a bridge with reinforcement learning and combinatorial optimization. Second, we developed new ways of automatically inducing latent structure inside the network, making it more expressive, scalable and interpretable. Synergies with probabilistic inference and sparse modeling techniques have been exploited. To complement these two innovations, we investigated new ways of incorporating weak supervision to reduce the need for labeled data.

Three highly challenging applications served as testbeds: machine translation, quality estimation, and dependency parsing. To maximize technological impact, this was done with collaboration with Unbabel, a start-up company in the crowd-sourcing translation industry.

I present below a summary of the main activities, including the main results, released code and datasets, as well as dissemination and training activities. All code is publicly released under the DeepSPIN Github repository (https://github.com/deep-spin).

In WP1 (“A planning mechanism for structural search”), we proposed and analyzed ”Fenchel-Young losses", establishing a connection between sparsity, generalized entropies, and margins (Blondel et al. AISTATS 2019, JMLR 2020; Martins et al., NeurIPS 2020; JMLR 2022). We developed a sparse sequence-to-sequence model which is able to output sparse probabilities, in addition to use sparse attention, applied to neural machine translation and morphological inflection (Peters et al. ACL 2019, NAACL 2021). We also investigated strategies to overcome exposure bias (Mihaylova and Martins, SRW@ACL 2019) and we proposed and investigated strategies for easy-first sequence-to-sequence models for automatic post-editing (Góis et al, EAMT 2020). Finally, we investigated higher order factors for syntactic parsing with non-projective dependencies, in combination with neural network scorers (Fonseca and Martins, ACL 2020).

In WP2 (“Induction of structure inside the network”), we proposed a new form of attention that is both sparse and constrained, using it to model fertility in neural machine translation (Malaviya et al., ACL 2018). We also proposed a structured form of sparse attention called SparseMAP (Niculae et al., ICML 2018), leading to dynamic computation graphs (Niculae et al., EMNLP 2018). Recently, this work was extended to accommodate arbitrary factors graphs (Niculae and Martins, ICML 2020). This enables endowing neural network architectures with differentiable layers that can induce latent structure containing logic and budget constraints, which has many potential applications in NLP. We participated in the SIGMORPHON shared task on morphological inflection (Peters and Martins, 2019) for which we ranked second and won the SIGMORPHON Interpretability Award. An extension of this work to multilingual morphological inflection and grapheme-to-phoneme translation was published at the SIGMORPHON 2020 shared tasks (Peters and Martins, 2020). For the first task we ranked first, ex-aequo with two other participating systems. We also investigated a hierarchical sparse attention model for document-level machine translation (Maruf et al., NAACL 2019) and context-aware machine translation for conversational data (Maruf et al., WMT 2018). We proposed a new evaluation framework and benchmark (MuDA) for context-aware machine translation, which assesses the ability of systems to capture discourse phenomena, a work which was awarded as the best resource paper at ACL 2023 (Fernandes et al., ACL 2023). We also developed a new variant of the Transformer architecture which is able to learn the sparsity of its attention heads, adaptively (Correia et al. EMNLP 2019). We initiated a line of work on explainability (Treviso et al., BlackboxNLP 2020; Fernandes et al., NeurIPS 2022; Treviso et al., ACL 2023). Finally, we published a series of works studying hallucinations in machine translation (Guerreiro et al. EACL 2023, ACL 2023, TACL 2023).

In WP3 (“Weak supervision and data-driven regularization”), we developed a simple and effective approach to automatic post-editing by fine-tuning a pre-trained BERT model (Correia and Martins, 2019). We exploited a multi-task learning objective as a means for weak supervision (Martins et al., SRW@ACL 2019), and a joint extractive/compressive approach to summarization (Mendes et al., NAACL 2019). In collaboration with a team of engineers, linguists, and researchers at Unbabel, we extracted keystroke information from human post-editors in a crowd-sourced machine translation platform, and created a new dataset with keystroke sequences, which was publicly released and analyzed (Góis and Martins, MT Summit 2019). We proposed a new chunk-based framework for efficient “on-the-fly” domain adaptation for machine translation using a semi-parametric retrieval-augmented approach (Martins et al., EMNLP 2022). We also proposed and made available a new massively multilingual model (Glot500) which leverages the statistical strength of data in multiple languages (ImaniGooghari et al., ACL 2023) – this work was awarded as an ACL Area Chair best paper. Finally, we published a survey about efficient methods for natural language processing (Treviso et al., TACL 2023).

In WP4 (“Application-based evaluation”), we co-organized the Quality Estimation shared tasks in the Conference for Machine Translation (WMT) in 2018–2022 (Specia et al., 2018; 2020; Fonseca et al., 2019, Zerva et al., 2021, 2022). We submitted system to this shared task (in collaboration with the Unbabel team) which was the winning system in several editions (starting with Kepler et al., WMT 2019). We were involved in the open-source OpenKiwi project, a framework for Quality Estimation developed in a collaboration between Unbabel and Instituto de Telecomunicações, awarded the best system demonstrations paper at ACL (Kepler et al, ACL 2019). We also participated and won the Automatic Post-Editing shared task in the Conference for Machine Translation (WMT) in 2019, in a joint submission between Unbabel and Instituto de Telecomunicações (Lopes et al., 2019). We developed a higher-order neural parser using dual decomposition inference (Fonseca and Martins, ACL 2020). We initiated a line of work for uncertainty-aware machine translation evaluation (Glushkova et al., EMNLP 2021; Zerva et al., EMNLP 2022). We also provided new metrics and analyses to better understand the failure models of machine translation systems (Rei et al., ACL 2023, Glushkova et al., EAMT 2023).

Besides the work carried out in the work packages above, we also participated in the several dissemination activities and training events:
- Presentation of a tutorial on Latent Structure Models for NLP (https://deep-spin.github.io/tutorial/) at ACL 2019 and RANLP 2019. This tutorial evolved into a book currently under review.
- Co-organization of the 3rd, 4th, 5th, 6th Workshop of Structured Prediction for NLP (2019--2022), co-located with NAACL, ACL, and EMNLP. Each year it has counted with 5-6 invited talks from top researchers in the field and 1-2 contributed talks, around the topic of deep structured prediction for NLP, a central topic in DeepSPIN.
- Co-organization of the Workshop of Deep Reinforcement Learning Meets Structured Prediction (https://sites.google.com/view/iclr2019-drlstructpred) co-located with ICLR 2019. It counted with four invited talks from top researchers in the field, and four contributed talks. The topic (intersections with deep reinforcement learning and structured prediction) is deeply related with WP1 of DeepSPIN.
- Co-organization of the Human-Aided Translation workshop at Machine Translation Summit, August 2019 (https://sites.google.com/unbabel.com/hat19/home). This workshop was co-organized in collaboration with a team at Unbabel, and it counted with six invited talks and a discussion panel around the interaction of machine translation and human post-editors.
- Lisbon Machine Learning School, 2018, 2019, 2020, 2021, 2022, 2023 (http://lxmls.it.pt). This Summer School was held in Lisbon, co-organized by myself (in 2020 it was fully virtual due to Covid-19).
- Invited talks at the Machine Learning Research School (MLRS 2019) in Bangkok and EurNLP 2019 in London.
- Three-days invited workshop at Instituto de Telecomunicações by guest lecturer Wilker Aziz (University of Amsterdam) on Deep Generative Models, 2019. It fostered ongoing collaboration with Wilker Aziz's group at University of Amsterdam. Two DeepSPIN workshops (in 2021), including one inter-disciplinary with the presence of Prof. Constantino Tsallis.
- Keynote talks at the TALN Récital 2021 (https://talnrecital2021.inria.fr/) the TRITON 2021 conference (https://triton-conference.org/) the IDSAI 2022 conference (https://www.idsai.manchester.ac.uk/connect/events/conference/conference2022/conference-schedule/) and the 38th Conference of the Spanish Society for Natural Language Processing (https://sepln2022.grupolys.org/en/guest-speakers/).
- Invited talk at the IC Colloquium EPFL in October 2022, Lausanne, Switzerland (https://memento.epfl.ch/event/ic-colloquium-from-sparse-modeling-to-sparse-commu/)
- Keynote talk at the Workshop on Multilingual, Multimodal and Multitask Language Generation (Multi3Generation) at EAMT in June 2023, Tampere, Finland (https://multi3generation.eu/workshops/eamt-2023/)
- Invited lecture in the National Doctoral School in Artificial Intelligence, University of Rome “La Sapienza” about “Artificial Intelligence and Natural Language Processing”.

Five doctoral dissertations have been completed and successfully defended in the scope of the project (Gonçalo Correia, 2022; Tsvetomila Mihaylova, 2022; Ben Peters, 2022; Pedro Martins, 2022; Marcos Treviso, 2023), 4 of which with the maximal grade (summa cum laude) and on of which was recipient of an important thesis award (Adamastor Prize 2022).

To sum up, we have achieved the following results and progress beyond the state of the art until now (detailed in the previous answer):
- New method for differentiable layers in neural networks capable of inducing latent structure and accommodate logic constraints (Niculae and Martins, ICML 2020, Niculae et al., EMNLP 2019, ICML 2019).
- New methods for producing explanations and rationales, increasing the interpretability of NLP systems (Fernandes et al., NeurIPS 2022; Treviso et al., Blackbox 2020, ACL 2023).
- New methods for detecting and correcting hallucinations in machine translation (Guerreiro et al. EACL 2023, ACL 2023, TACL 2023).
- New loss function and decoder which enables searching over a sparse set of structures (Blondel et al., AISTATS 2019, JMLR 2020, Peters et al., ACL 2019; Martins et al. JMLR 2023).
- New method for adaptive sparsity in transformer architectures for neural machine translation (Correia et al., EMNLP 2019, Peters et al., ACL 2019).
- New methods for sparse, structured, and constrained attention with gains in accuracy and interpretability (Malaviya et al., ACL 2019; Niculae et al., EMNLP 2019, ICML 2019; Peters et al., ACL 2019).
- New method using hierarchical attention for context-aware machine translation (Maruf et al., NAACL 2019).
- New attention mechanisms usable in continuous domains, to better model long-term memories and improve generation quality (Martins et al., ACL 2022).
- New method for evaluation of context-aware machine translation (Fernandes et al., ACL 2023, awarded as the best resource paper).
- Best system demo paper at ACL 2019 for open-source system OpenKiwi (Kepler et al., ACL 2019).
- Winning system in shared task for quality estimation at WMT 2019 and 2022 (for all tracks: word-level, sentence-level and document-level, and all language pairs) (Kepler et al., WMT 2019).
- Winning system in shared task for automatic post-editing at WMT 2019 for English-German (Lopes et al., WMT 2019).
- Interpretability Prize at SIGMORPHON (morphological inflection shared task) (Peters and Martins, SIGMORPHON 2019).
- Practical method based on transfer learning (using a BERT pre-trained model) which led to state of the art numbers for automatic post-editing for the English-German WMT 2018 dataset (Correia and Martins, ACL 2019).
- New dataset with human translator post-editing actions and keystrokes (Góis and Martins, MT Summit 2019).

Periodic Reporting for period 4 - DeepSPIN (Deep Learning for Structured Prediction in Natural Language Processing)

Condividi questa pagina

Scarica