Related papers: DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

URL: http://arxiv.org/abs/2106.13736v1
Date: Fri, 25 Jun 2021 16:12:10 GMT
Title: DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Authors: Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
Abstract summary: We introduce DeltaLM, a pretrained multilingual encoder-decoder model. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks.
Score: 92.90543340071007
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model that regards the decoder as the task layer of off-the-shelf pretrained encoders. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. To take advantage of both the large-scale monolingual data and bilingual data, we adopt the span corruption and translation span corruption as the pre-training tasks. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks, including machine translation, abstractive text summarization, data-to-text, and question generation.

Related papers

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation [28.07831604833682]
We investigate the issue of the decoder-only architecture to its lack of language transfer capability. We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage. We impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation.
arXiv Detail & Related papers (2024-12-03T02:52:14Z)
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z)
Is Encoder-Decoder Redundant for Neural Machine Translation? [44.37101354412253]
encoder-decoder architecture is still the de facto neural network architecture for state-of-the-art models. In this work, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation. This alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.
arXiv Detail & Related papers (2022-10-21T08:33:55Z)
Multilingual Neural Machine Translation with Deep Encoder and Multiple Shallow Decoders [77.2101943305862]
We propose a deep encoder with multiple shallow decoders (DEMSD) where each shallow decoder is responsible for a disjoint subset of target languages. DEMSD model with 2-layer decoders is able to obtain a 1.8x speedup on average compared to a standard transformer model with no drop in translation quality.
arXiv Detail & Related papers (2022-06-05T01:15:04Z)
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation [38.02741711554989]
Chinese Pre-trained Unbalanced Transformer (CPT) is designed for both natural language understanding (NLU) and natural language generation (NLG) tasks. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. With partially shared architecture and multi-task pre-training, CPT can learn specific knowledge of both NLU or NLG tasks with two decoders.
arXiv Detail & Related papers (2021-09-13T06:25:45Z)
Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation [11.570746514243117]
We introduce another decoder, called seer decoder, into the encoder-decoder framework during training. We force the conventional decoder to simulate the behaviors of the seer decoder via knowledge distillation. Experiments show our method can outperform competitive baselines significantly and achieves greater improvements on the bigger data sets.
arXiv Detail & Related papers (2021-06-12T11:38:40Z)
Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network [99.03895740754402]
We propose a two-stream decoupled design of encoder-decoder structure, in which two decoupled cross-modal encoder and decoder are involved. As an alternative, we propose a primary scheduled sampling strategy that mitigates such discrepancy via pretraining encoder-decoder in a two-pass manner.
arXiv Detail & Related papers (2021-01-27T17:36:57Z)
On the Sub-Layer Functionalities of Transformer Decoder [74.83087937309266]
We study how Transformer-based decoders leverage information from the source and target languages. Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance.
arXiv Detail & Related papers (2020-10-06T11:50:54Z)
Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task. Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.