E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language
Understanding and Generation
- URL: http://arxiv.org/abs/2205.14912v3
- Date: Tue, 9 Jan 2024 09:44:10 GMT
- Title: E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language
Understanding and Generation
- Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du and Dacheng Tao
- Abstract summary: Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models.
We propose an encoding-enhanced seq2seq pretraining strategy, namely E2S2.
E2S2 improves the seq2seq models via integrating more efficient self-supervised information into the encoders.
- Score: 95.49128988683191
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale
pretraining language models. However, the prior seq2seq pretraining models
generally focus on reconstructive objectives on the decoder side and neglect
the effect of encoder-side supervision, which we argue may lead to sub-optimal
performance. To verify our hypothesis, we first empirically study the
functionalities of the encoder and decoder in seq2seq pretrained language
models, and find that the encoder takes an important but under-exploitation
role than the decoder regarding the downstream performance and neuron
activation. Therefore, we propose an encoding-enhanced seq2seq pretraining
strategy, namely E2S2, which improves the seq2seq models via integrating more
efficient self-supervised information into the encoders. Specifically, E2S2
adopts two self-supervised objectives on the encoder side from two aspects: 1)
locally denoising the corrupted sentence (denoising objective); and 2) globally
learning better sentence representations (contrastive objective). With the help
of both objectives, the encoder can effectively distinguish the noise tokens
and capture high-level (i.e., syntactic and semantic) knowledge, thus
strengthening the ability of seq2seq model to accurately achieve the
conditional generation. On a large diversity of downstream natural language
understanding and generation tasks, E2S2 dominantly improves the performance of
its powerful backbone models, e.g., BART and T5. For example, upon BART
backbone, we achieve +1.1% averaged gain on the general language understanding
evaluation (GLUE) benchmark and +1.75% F_0.5 score improvement on CoNLL2014
dataset. We also provide in-depth analyses to show the improvement stems from
better linguistic representation. We hope that our work will foster future
self-supervision research on seq2seq language model pretraining.
Related papers
- Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme.
We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language.
We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Improving End-to-End Models for Set Prediction in Spoken Language
Understanding [26.781489293420055]
We propose a novel data augmentation technique along with an implicit attention based alignment method to infer the spoken order.
F1 scores significantly increased by more than 11% for RNN-T and about 2% for attention based encoder-decoder SLU models, outperforming previously reported results.
arXiv Detail & Related papers (2022-01-28T13:23:17Z) - Regularized Training of Nearest Neighbor Language Models [10.994336081018043]
We build upon $k$NN-LM citepkhandelwal20generalization, which uses a pre-trained language model together with an exhaustive $k$NN search through the training data (memory bank) to achieve state-of-the-art results.
We find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.
arXiv Detail & Related papers (2021-09-16T23:20:24Z) - Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive
Text Summarization [15.367455931848252]
We present a sequence-to-sequence (seq2seq) autoencoder via contrastive learning for abstractive text summarization.
Our model adopts a standard Transformer-based architecture with a multi-layer bi-directional encoder and an auto-regressive decoder.
We conduct experiments on two datasets and demonstrate that our model outperforms many existing benchmarks.
arXiv Detail & Related papers (2021-08-26T18:45:13Z) - Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for
Low-resource Speech Recognition [9.732767611907068]
In this work, we fuse a pre-trained acoustic encoder (wav2vec2.0) and a pre-trained linguistic encoder (BERT) into an end-to-end ASR model.
Our model achieves better recognition performance on CALLHOME corpus (15 hours) than other end-to-end models.
arXiv Detail & Related papers (2021-01-17T16:12:44Z) - Orthros: Non-autoregressive End-to-end Speech Translation with
Dual-decoder [64.55176104620848]
We propose a novel NAR E2E-ST framework, Orthros, in which both NAR and autoregressive (AR) decoders are jointly trained on the shared speech encoder.
The latter is used for selecting better translation among various length candidates generated from the former, which dramatically improves the effectiveness of a large length beam with negligible overhead.
Experiments on four benchmarks show the effectiveness of the proposed method in improving inference speed while maintaining competitive translation quality.
arXiv Detail & Related papers (2020-10-25T06:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.