Translating from Multiple Modalities into Text

Periodic Reporting for period 4 - TransModal (Translating from Multiple Modalities into Text)

Période du rapport: 2021-03-01 au 2022-08-31

In this project we maintain that in order to render electronic data
more accessible to individuals and computers alike, new types of
translation models need to be developed. We take advantage of
recent advances in deep learning to induce
general representations for different modalities and learn how these
interact and can be rendered in natural language.

We detail below our specific objectives, each one relating to a particular Work Package.

A. Definition of Translation Task. We formally characterize the
translation process, study how it manifests itself in real data,
and devise novel algorithms that gather comparable corpora.

B. Modeling Framework. We formalize the
translation process following the encoder-decoder modeling
paradigm.

C. Development of Applications. We develop applications
representative of the different aspects of the translation problem.

The work performed from the beginning of the project and the main results achieved so far can be summarized as follows:

Natural Language Generation:

1. Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. We developed neural network architectures which incorporate content selection and planning without sacrificing end-to-end training.

2. We developed an entity-centric neural architecture for data-to-text generation. Our model creates entity-specific representations which are dynamically updated.

3. A core step in statistical data-to-text generation concerns learning correspondences between structured data representations (e.g. facts in a database) and associated texts. We bootstrap generators from large scale datasets where the data (e.g. DBPedia facts) and related texts (e.g. Wikipedia abstracts) are loosely aligned by using multi-instance learning to automatically discover correspondences between data and text pairs and show how these can be used to enhance the content signal while training an encoder-decoder architecture.

Semantic Parsing:

1. Semantic parsing aims at mapping natural language utterances into structured meaning representations. We proposed a structure-aware neural architecture which decomposes the semantic parsing process into two stages. Given an input utterance, we first generate a rough sketch of its meaning, where low-level information (such as variable names and arguments) is glossed over. Then, we fill in missing details by taking into account the natural language input and the sketch itself.

2. We also focused on confidence modeling for neural semantic parsers which are built upon sequence-to-sequence models. We outline three major causes of uncertainty, and design various metrics to quantify these factors. These metrics are then used to estimate confidence scores hat indicate whether model predictions are likely to be correct. Beyond confidence estimation, we identify which parts of the input contribute to uncertain predictions allowing users to interpret their model, and verify or refine its input.

3. We developed a neural semantic parser that maps natural language utterances onto logical forms that can be executed against a task-specific environment, such as a knowledge base or a database, to produce a response. The parser generates tree-structured logical forms with a transition-based approach, combining a generic tree-generation algorithm with domain-general grammar defined by the logical language. The generation process is modeled by structured recurrent neural networks, which provide a rich encoding of the sentential context and generation history for making predictions.

Summarization:

1. We developed a neural framework for opinion summarization from online product reviews which is knowledge-lean and only requires light supervision (e.g. in the form of product domain labels and user-provided ratings). Our method combines two weakly supervised components to identify salient opinions and form
extractive summaries from multiple reviews: an aspect extractor trained under a multi-task objective, and a sentiment predictor based on multiple instance learning.

2. We introduced extreme summarization, a new single-document summarization task which does not favor extractive strategies and calls for an abstractive modeling approach. The idea is to create a short, one-sentence news summary answering the question “What is the article about?”. We collect a real-world, large scale dataset for this task by harvesting online articles from the British Broadcasting Corporation (BBC). We propose a novel abstractive model which is conditioned on the article’s topics and based entirely on convolutional neural networks.

Text Rewriting:

1. Recognizing and generating paraphrases is an important component in many natural language processing applications. A well-established technique for automatically
extracting paraphrases leverages bilingual corpora to find meaning-equivalent phrases in a single language by “pivoting” over a shared translation in another
language. In this paper we revisit bilingual pivoting in the context of neural machine translation and present a paraphrasing model based purely on neural networks.

2. We advocate the use of bilingual corpora which are abundantly available for training sentence compression models. Our approach borrows much of its machinery from neural machine translation and leverages bilingual pivoting: compressions are obtained by translating a source string into a foreign language and then back-translating it
into the source while controlling the translation length. Our model can be trained for any language as long as a bilingual corpus is available and performs arbitrary rewrites without access to compression specific data.

3. Question answering (QA) systems are sensitive to the many different ways natural language expresses the same information need. We turn to paraphrases as a means of capturing this knowledge and present a general framework which learns felicitous paraphrases for various QA tasks. Our method is trained end-to-end using question-answer pairs as a supervision signal.

By the end of the project we aim to have state-of-the art results in various areas:

1. We are in the middle of developing a robust graph-to-text generator for data-to-text generation which we will be released to the scientific community, with a dedicated
resource page and leader board.

2. We have also made several advances in single and multiple-document summarization (abstractive and extractive). Recent work focuses on transfer learning for summarization using the recently released BERT model. Initial results have been very promising showing that transfer learning can boost summarization performance across the board for various types of summaries and languages.

3. We plan to build semantic parsers which not only operate across various meaning representations but also across languages and when training data is available.

4. We have exploited the potential of neural machine translation for text rewriting. We have produced datasets for sentence compression and are planning to show that our models work also for simplification, across languages.

transmodal.jpg

Periodic Reporting for period 4 - TransModal (Translating from Multiple Modalities into Text)

Partager cette page

Télécharger