Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves
Non-Autoregressive Translators
- URL: http://arxiv.org/abs/2211.06075v1
- Date: Fri, 11 Nov 2022 09:10:14 GMT
- Title: Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves
Non-Autoregressive Translators
- Authors: Xinyou Wang, Zaixiang Zheng, Shujian Huang
- Abstract summary: Probability framework of NAR models requires conditional independence assumption on target sequences.
We propose a simple and model-agnostic multi-task learning framework to provide more informative learning signals.
Our approach can consistently improve accuracy of multiple NAR baselines without adding any additional decoding overhead.
- Score: 35.939982651768666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, non-autoregressive (NAR) neural machine translation models have
received increasing attention due to their efficient parallel decoding.
However, the probabilistic framework of NAR models necessitates conditional
independence assumption on target sequences, falling short of characterizing
human language data. This drawback results in less informative learning signals
for NAR models under conventional MLE training, thereby yielding unsatisfactory
accuracy compared to their autoregressive (AR) counterparts. In this paper, we
propose a simple and model-agnostic multi-task learning framework to provide
more informative learning signals. During training stage, we introduce a set of
sufficiently weak AR decoders that solely rely on the information provided by
NAR decoder to make prediction, forcing the NAR decoder to become stronger or
else it will be unable to support its weak AR partners. Experiments on WMT and
IWSLT datasets show that our approach can consistently improve accuracy of
multiple NAR baselines without adding any additional decoding overhead.
Related papers
- Leveraging Diverse Modeling Contexts with Collaborating Learning for
Neural Machine Translation [26.823126615724888]
Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT)
We propose a novel generic collaborative learning method, DCMCL, where AR and NAR models are treated as collaborators instead of teachers and students.
arXiv Detail & Related papers (2024-02-28T15:55:02Z) - AMLNet: Adversarial Mutual Learning Neural Network for
Non-AutoRegressive Multi-Horizon Time Series Forecasting [4.911305944028228]
We introduce AMLNet, an innovative NAR model that achieves realistic forecasts through an online Knowledge Distillation approach.
AMLNet harnesses the strengths of both AR and NAR models by training a deep AR decoder and a deep NAR decoder in a collaborative manner.
This knowledge transfer is facilitated through two key mechanisms: 1) outcome-driven KD, which dynamically weights the contribution of KD losses from the teacher models, enabling the shallow NAR decoder to incorporate the ensemble's diversity; and 2) hint-driven KD, which employs adversarial training to extract valuable insights from the model's hidden states for distillation.
arXiv Detail & Related papers (2023-10-30T06:10:00Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - A Survey on Non-Autoregressive Generation for Neural Machine Translation
and Beyond [145.43029264191543]
Non-autoregressive (NAR) generation is first proposed in machine translation (NMT) to speed up inference.
While NAR generation can significantly accelerate machine translation, the inference of autoregressive (AR) generation sacrificed translation accuracy.
Many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation.
arXiv Detail & Related papers (2022-04-20T07:25:22Z) - Distributionally Robust Recurrent Decoders with Random Network
Distillation [93.10261573696788]
We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to disregard OOD context during inference.
We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets.
arXiv Detail & Related papers (2021-10-25T19:26:29Z) - A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text
Generation [59.64193903397301]
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.
We conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR)
The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances.
arXiv Detail & Related papers (2021-10-11T13:05:06Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Improving Non-autoregressive Neural Machine Translation with Monolingual
Data [13.43438045177293]
Non-autoregressive (NAR) neural machine translation is usually done via knowledge distillation from an autoregressive (AR) model.
We leverage large monolingual corpora to improve the NAR model's performance.
arXiv Detail & Related papers (2020-05-02T22:24:52Z) - A Study of Non-autoregressive Model for Sequence Generation [147.89525760170923]
Non-autoregressive (NAR) models generate all the tokens of a sequence in parallel.
We propose knowledge distillation and source-target alignment to bridge the gap between AR and NAR models.
arXiv Detail & Related papers (2020-04-22T09:16:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.