Improving Non-autoregressive Neural Machine Translation with Monolingual
Data
- URL: http://arxiv.org/abs/2005.00932v3
- Date: Sun, 29 Nov 2020 21:48:51 GMT
- Title: Improving Non-autoregressive Neural Machine Translation with Monolingual
Data
- Authors: Jiawei Zhou, Phillip Keung
- Abstract summary: Non-autoregressive (NAR) neural machine translation is usually done via knowledge distillation from an autoregressive (AR) model.
We leverage large monolingual corpora to improve the NAR model's performance.
- Score: 13.43438045177293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive (NAR) neural machine translation is usually done via
knowledge distillation from an autoregressive (AR) model. Under this framework,
we leverage large monolingual corpora to improve the NAR model's performance,
with the goal of transferring the AR model's generalization ability while
preventing overfitting. On top of a strong NAR baseline, our experimental
results on the WMT14 En-De and WMT16 En-Ro news translation tasks confirm that
monolingual data augmentation consistently improves the performance of the NAR
model to approach the teacher AR model's performance, yields comparable or
better results than the best non-iterative NAR methods in the literature and
helps reduce overfitting in the training process.
Related papers
- Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation [15.632419297059993]
Non-autoregressive (NAR) language models are known for their low latency in neural machine translation (NMT)
A performance gap exists between NAR and autoregressive models due to the large decoding space and difficulty capturing independency between target words accurately.
We apply reinforcement learning (RL) to Levenshtein Transformer, a representative edit-based NAR model, demonstrating that RL with self-generated data can enhance the performance of edit-based NAR models.
arXiv Detail & Related papers (2024-05-02T13:39:28Z) - Leveraging Diverse Modeling Contexts with Collaborating Learning for
Neural Machine Translation [26.823126615724888]
Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT)
We propose a novel generic collaborative learning method, DCMCL, where AR and NAR models are treated as collaborators instead of teachers and students.
arXiv Detail & Related papers (2024-02-28T15:55:02Z) - Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves
Non-Autoregressive Translators [35.939982651768666]
Probability framework of NAR models requires conditional independence assumption on target sequences.
We propose a simple and model-agnostic multi-task learning framework to provide more informative learning signals.
Our approach can consistently improve accuracy of multiple NAR baselines without adding any additional decoding overhead.
arXiv Detail & Related papers (2022-11-11T09:10:14Z) - Non-Autoregressive Machine Translation: It's Not as Fast as it Seems [84.47091735503979]
We point out flaws in the evaluation methodology present in the literature on NAR models.
We compare NAR models with other widely used methods for improving efficiency.
We call for more realistic and extensive evaluation of NAR models in future work.
arXiv Detail & Related papers (2022-05-04T09:30:17Z) - A Survey on Non-Autoregressive Generation for Neural Machine Translation
and Beyond [145.43029264191543]
Non-autoregressive (NAR) generation is first proposed in machine translation (NMT) to speed up inference.
While NAR generation can significantly accelerate machine translation, the inference of autoregressive (AR) generation sacrificed translation accuracy.
Many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation.
arXiv Detail & Related papers (2022-04-20T07:25:22Z) - Improving Neural Machine Translation by Denoising Training [95.96569884410137]
We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation.
We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally.
Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
arXiv Detail & Related papers (2022-01-19T00:11:38Z) - Can Multilinguality benefit Non-autoregressive Machine Translation? [11.671379480940407]
Non-autoregressive (NAR) machine translation has recently achieved significant improvements, and now outperforms autoregressive (AR) models on some benchmarks.
We present a comprehensive empirical study of multilingual NAR.
We test its capabilities with respect to positive transfer between related languages and negative transfer under capacity constraints.
arXiv Detail & Related papers (2021-12-16T02:20:59Z) - A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text
Generation [59.64193903397301]
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.
We conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR)
The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances.
arXiv Detail & Related papers (2021-10-11T13:05:06Z) - Non-Parametric Online Learning from Human Feedback for Neural Machine
Translation [54.96594148572804]
We study the problem of online learning with human feedback in the human-in-the-loop machine translation.
Previous methods require online model updating or additional translation memory networks to achieve high-quality performance.
We propose a novel non-parametric online learning method without changing the model structure.
arXiv Detail & Related papers (2021-09-23T04:26:15Z) - Understanding and Improving Lexical Choice in Non-Autoregressive
Translation [98.11249019844281]
We propose to expose the raw data to NAT models to restore the useful information of low-frequency words.
Our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively.
arXiv Detail & Related papers (2020-12-29T03:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.