Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
- URL: http://arxiv.org/abs/2101.00416v1
- Date: Sat, 2 Jan 2021 10:27:11 GMT
- Title: Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
- Authors: Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei
- Abstract summary: We propose Sequence Span Rewriting (SSR) as a self-supervised sequence-to-sequence (seq2seq) pre-training objective.
SSR provides more fine-grained learning signals for text representations by supervising the model to rewrite imperfect spans to ground truth.
Our experiments with T5 models on various seq2seq tasks show that SSR can substantially improve seq2seq pre-training.
- Score: 54.03356526990088
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we generalize text infilling (e.g., masked language models) by
proposing Sequence Span Rewriting (SSR) as a self-supervised
sequence-to-sequence (seq2seq) pre-training objective. SSR provides more
fine-grained learning signals for text representations by supervising the model
to rewrite imperfect spans to ground truth, and it is more consistent than text
infilling with many downstream seq2seq tasks that rewrite a source sentences
into a target sentence. Our experiments with T5 models on various seq2seq tasks
show that SSR can substantially improve seq2seq pre-training. Moreover, we
observe SSR is especially helpful to improve pre-training a small-size seq2seq
model with a powerful imperfect span generator, which indicates a new
perspective of transferring knowledge from a large model to a smaller model for
seq2seq pre-training.
Related papers
- Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Self-Supervised Query Reformulation for Code Search [6.415583252034772]
We propose SSQR, a self-supervised query reformulation method that does not rely on any parallel query corpus.
Inspired by pre-trained models, SSQR treats query reformulation as a masked language modeling task.
arXiv Detail & Related papers (2023-07-01T08:17:23Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - The impact of memory on learning sequence-to-sequence tasks [6.603326895384289]
Recent success of neural networks in natural language processing has drawn renewed attention to learning sequence-to-sequence (seq2seq) tasks.
We propose a model for a seq2seq task that has the advantage of providing explicit control over the degree of memory, or non-Markovianity, in the sequences.
arXiv Detail & Related papers (2022-05-29T14:57:33Z) - Improved Consistency Training for Semi-Supervised Sequence-to-Sequence
ASR via Speech Chain Reconstruction and Self-Transcribing [21.049557187137776]
We propose an improved consistency training paradigm of semi-supervised S2S ASR.
We utilize speech chain reconstruction as the weak augmentation to generate high-quality pseudo labels.
Our improved paradigm achieves a 12.2% CER improvement in the single-speaker setting and 38.6% in the multi-speaker setting.
arXiv Detail & Related papers (2022-05-14T04:26:13Z) - Sequence-to-Sequence Learning via Attention Transfer for Incremental
Speech Recognition [25.93405777713522]
We investigate whether it is possible to employ the original architecture of attention-based ASR for ISR tasks.
We design an alternative student network that, instead of using a thinner or a shallower model, keeps the original architecture of the teacher model but with shorter sequences.
Our experiments show that by delaying the starting time of recognition process with about 1.7 sec, we can achieve comparable performance to one that needs to wait until the end.
arXiv Detail & Related papers (2020-11-04T05:06:01Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - Structured Multimodal Attentions for TextVQA [57.71060302874151]
We propose an end-to-end structured multimodal attention (SMA) neural network to mainly solve the first two issues above.
SMA first uses a structural graph representation to encode the object-object, object-text and text-text relationships appearing in the image, and then designs a multimodal graph attention network to reason over it.
Our proposed model outperforms the SoTA models on TextVQA dataset and two tasks of ST-VQA dataset among all models except pre-training based TAP.
arXiv Detail & Related papers (2020-06-01T07:07:36Z) - Conversational Question Reformulation via Sequence-to-Sequence
Architectures and Pretrained Language Models [56.268862325167575]
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs)
We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task.
We evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task.
arXiv Detail & Related papers (2020-04-04T11:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.