s2s-ft: Fine-Tuning Pretrained Transformer Encoders for
Sequence-to-Sequence Learning
- URL: http://arxiv.org/abs/2110.13640v1
- Date: Tue, 26 Oct 2021 12:45:34 GMT
- Title: s2s-ft: Fine-Tuning Pretrained Transformer Encoders for
Sequence-to-Sequence Learning
- Authors: Hangbo Bao, Li Dong, Wenhui Wang, Nan Yang, Furu Wei
- Abstract summary: We present a sequence-to-sequence fine-tuning toolkit s2s-ft, which adopts pretrained Transformers for conditional generation tasks.
S2s-ft achieves strong performance on several benchmarks of abstractive summarization, and question generation.
- Score: 47.30689555136054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained bidirectional Transformers, such as BERT, have achieved
significant improvements in a wide variety of language understanding tasks,
while it is not straightforward to directly apply them for natural language
generation. In this paper, we present a sequence-to-sequence fine-tuning
toolkit s2s-ft, which adopts pretrained Transformers for conditional generation
tasks. Inspired by UniLM, we implement three sequence-to-sequence fine-tuning
algorithms, namely, causal fine-tuning, masked fine-tuning, and pseudo-masked
fine-tuning. By leveraging the existing pretrained bidirectional Transformers,
experimental results show that s2s-ft achieves strong performance on several
benchmarks of abstractive summarization, and question generation. Moreover, we
demonstrate that the package s2s-ft supports both monolingual and multilingual
NLG tasks. The s2s-ft toolkit is available at
https://github.com/microsoft/unilm/tree/master/s2s-ft.
Related papers
- Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale [36.584680344291556]
We present Generative Pretrained Structured Transformers (GPST), an unsupervised SLM at scale capable of being pre-trained from scratch on raw texts.
GPST circumvents the limitations of previous SLMs such as relying on gold trees and sequential training.
GPST significantly outperforms existing unsupervised SLMs on left-to-right grammar induction.
arXiv Detail & Related papers (2024-03-13T06:54:47Z) - TranSFormer: Slow-Fast Transformer for Machine Translation [52.12212173775029]
We present a textbfSlow-textbfFast two-stream learning model, referred to as TrantextbfSFormer.
Our TranSFormer shows consistent BLEU improvements (larger than 1 BLEU point) on several machine translation benchmarks.
arXiv Detail & Related papers (2023-05-26T14:37:38Z) - Paragraph-based Transformer Pre-training for Multi-Sentence Inference [99.59693674455582]
We show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks.
We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences.
arXiv Detail & Related papers (2022-05-02T21:41:14Z) - Transformer over Pre-trained Transformer for Neural Text Segmentation
with Enhanced Topic Coherence [6.73258176462356]
It consists of two components: bottom-level sentence encoders using pre-trained transformers, and an upper-level transformer-based segmentation model based on the sentence embeddings.
Our experiments show that Transformer$2$ manages to surpass state-of-the-art text segmentation models in terms of a commonly-used semantic coherence measure.
arXiv Detail & Related papers (2021-10-14T05:26:39Z) - Duplex Sequence-to-Sequence Learning for Reversible Machine Translation [53.924941333388155]
Sequence-to-sequence (seq2seq) problems such as machine translation are bidirectional.
We propose a em duplex seq2seq neural network, REDER, and apply it to machine translation.
Experiments on widely-used machine translation benchmarks verify that REDER achieves the first success of reversible machine translation.
arXiv Detail & Related papers (2021-05-07T18:21:57Z) - Adapting Pretrained Transformer to Lattices for Spoken Language
Understanding [39.50831917042577]
It is shown that encoding lattices as opposed to 1-best results generated by automatic speech recognizer (ASR) boosts the performance of spoken language understanding (SLU)
This paper aims at adapting pretrained transformers to lattice inputs in order to perform understanding tasks specifically for spoken language.
arXiv Detail & Related papers (2020-11-02T07:14:34Z) - Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU [19.812562421377706]
bidirectional LSTMs and Transformers assume that the sequence that is to be encoded is available in full.
We investigate how they behave under incremental interfaces, when partial output must be provided.
Results support the possibility of using bidirectional encoders in incremental mode while retaining most of their non-incremental quality.
arXiv Detail & Related papers (2020-10-11T19:51:21Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z) - Multi-level Head-wise Match and Aggregation in Transformer for Textual
Sequence Matching [87.97265483696613]
We propose a new approach to sequence pair matching with Transformer, by learning head-wise matching representations on multiple levels.
Experiments show that our proposed approach can achieve new state-of-the-art performance on multiple tasks.
arXiv Detail & Related papers (2020-01-20T20:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.