Improving Non-autoregressive Generation with Mixup Training
- URL: http://arxiv.org/abs/2110.11115v1
- Date: Thu, 21 Oct 2021 13:04:21 GMT
- Title: Improving Non-autoregressive Generation with Mixup Training
- Authors: Ting Jiang, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang,
Furu Wei, Haizhen Huang, Liangjie Zhang, Qi Zhang
- Abstract summary: We present a non-autoregressive generation model based on pre-trained transformer models.
We propose a simple and effective iterative training method called MIx Source and pseudo Target.
Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results.
- Score: 51.61038444990301
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While pre-trained language models have achieved great success on various
natural language understanding tasks, how to effectively leverage them into
non-autoregressive generation tasks remains a challenge. To solve this problem,
we present a non-autoregressive generation model based on pre-trained
transformer models. To bridge the gap between autoregressive and
non-autoregressive models, we propose a simple and effective iterative training
method called MIx Source and pseudo Target (MIST). Unlike other iterative
decoding methods, which sacrifice the inference speed to achieve better
performance based on multiple decoding iterations, MIST works in the training
stage and has no effect on inference time. Our experiments on three generation
benchmarks including question generation, summarization and paraphrase
generation, show that the proposed framework achieves the new state-of-the-art
results for fully non-autoregressive models. We also demonstrate that our
method can be used to a variety of pre-trained models. For instance, MIST based
on the small pre-trained model also obtains comparable performance with seq2seq
models.
Related papers
- Generative Pre-training for Speech with Flow Matching [81.59952572752248]
We pre-trained a generative model, named SpeechFlow, on 60k hours of untranscribed speech with Flow Matching and masked conditions.
Experiment results show the pre-trained generative model can be fine-tuned with task-specific data to match or surpass existing expert models on speech enhancement, separation, and synthesis.
arXiv Detail & Related papers (2023-10-25T03:40:50Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large
Language Models [11.57282859281814]
We consider different knowledge levels and attribution strategies, and find that we can correctly trace back 8 out of the 10 fine tuned models with our best method.
arXiv Detail & Related papers (2023-06-15T17:42:48Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Enhancing Text Generation with Cooperative Training [23.971227375706327]
Most prevailing methods trained generative and discriminative models in isolation, which left them unable to adapt to changes in each other.
We introduce a textitself-consistent learning framework in the text field that involves training a discriminator and generator cooperatively in a closed-loop manner.
Our framework are able to mitigate training instabilities such as mode collapse and non-convergence.
arXiv Detail & Related papers (2023-03-16T04:21:19Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - METRO: Efficient Denoising Pretraining of Large Scale Autoencoding
Language Models with Model Generated Signals [151.3601429216877]
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.
We propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO)
The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks.
arXiv Detail & Related papers (2022-04-13T21:39:15Z) - Bridging Pre-trained Models and Downstream Tasks for Source Code
Understanding [13.65914588243695]
We propose an approach to bridge pre-trained models and code-related tasks.
We exploit semantic-preserving transformation to enrich downstream data diversity.
We introduce curriculum learning to organize the transformed data in an easy-to-hard manner to fine-tune existing pre-trained models.
arXiv Detail & Related papers (2021-12-04T07:21:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.