Candidate Soups: Fusing Candidate Results Improves Translation Quality
for Non-Autoregressive Translation
- URL: http://arxiv.org/abs/2301.11503v1
- Date: Fri, 27 Jan 2023 02:39:42 GMT
- Title: Candidate Soups: Fusing Candidate Results Improves Translation Quality
for Non-Autoregressive Translation
- Authors: Huanran Zheng, Wei Zhu, Pengfei Wang and Xiaoling Wang
- Abstract summary: Non-autoregressive translation (NAT) model achieves a much faster inference speed than the autoregressive translation (AT) model.
Existing NAT methods only focus on improving the NAT model's performance but do not fully utilize it.
We propose a simple but effective method called "Candidate Soups," which can obtain high-quality translations.
- Score: 15.332496335303189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive translation (NAT) model achieves a much faster inference
speed than the autoregressive translation (AT) model because it can
simultaneously predict all tokens during inference. However, its translation
quality suffers from degradation compared to AT. And existing NAT methods only
focus on improving the NAT model's performance but do not fully utilize it. In
this paper, we propose a simple but effective method called "Candidate Soups,"
which can obtain high-quality translations while maintaining the inference
speed of NAT models. Unlike previous approaches that pick the individual result
and discard the remainders, Candidate Soups (CDS) can fully use the valuable
information in the different candidate translations through model uncertainty.
Extensive experiments on two benchmarks (WMT'14 EN-DE and WMT'16 EN-RO)
demonstrate the effectiveness and generality of our proposed method, which can
significantly improve the translation quality of various base models. More
notably, our best variant outperforms the AT model on three translation tasks
with 7.6 times speedup.
Related papers
- Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC [51.34222224728979]
This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models.
We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively.
Our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.
arXiv Detail & Related papers (2023-06-10T05:24:29Z) - Revisiting Non-Autoregressive Translation at Scale [76.93869248715664]
We systematically study the impact of scaling on non-autoregressive translation (NAT) behaviors.
We show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance.
We establish a new benchmark by validating scaled NAT models on a scaled dataset.
arXiv Detail & Related papers (2023-05-25T15:22:47Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Non-Autoregressive Document-Level Machine Translation [35.48195990457836]
Non-autoregressive translation (NAT) models achieve comparable performance and superior speed compared to auto-regressive translation (AT) models.
However, their abilities are unexplored in document-level machine translation (MT)
We propose a simple but effective design of sentence alignment between source and target.
arXiv Detail & Related papers (2023-05-22T09:59:59Z) - Multi-Granularity Optimization for Non-Autoregressive Translation [20.85478899258943]
Non-autoregressive machine translation (NAT) suffers severe performance deterioration due to the naive independence assumption.
We propose multi-granularity optimization for NAT, which collects model behaviors on translation segments of various granularities and integrates feedback for backpropagation.
Experiments on four WMT benchmarks show that the proposed method significantly outperforms the baseline models trained with cross-entropy loss.
arXiv Detail & Related papers (2022-10-20T04:54:29Z) - Non-Autoregressive Translation with Layer-Wise Prediction and Deep
Supervision [33.04082398101807]
Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient.
Recent non-autoregressive translation models speed up the inference, but their quality is still inferior.
We propose DSLP, a highly efficient and high-performance model for machine translation.
arXiv Detail & Related papers (2021-10-14T16:36:12Z) - Progressive Multi-Granularity Training for Non-Autoregressive
Translation [98.11249019844281]
Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence.
Recent studies show that NAT is weak at learning high-mode of knowledge such as one-to-many translations.
We argue that modes can be divided into various granularities which can be learned from easy to hard.
arXiv Detail & Related papers (2021-06-10T07:16:07Z) - Modeling Coverage for Non-Autoregressive Neural Machine Translation [9.173385214565451]
We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement.
Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.
arXiv Detail & Related papers (2021-04-24T07:33:23Z) - Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade [47.97977478431973]
Fully non-autoregressive neural machine translation (NAT) is proposed to simultaneously predict tokens with single forward of neural networks.
In this work, we target on closing the performance gap while maintaining the latency advantage.
arXiv Detail & Related papers (2020-12-31T18:52:59Z) - Understanding and Improving Lexical Choice in Non-Autoregressive
Translation [98.11249019844281]
We propose to expose the raw data to NAT models to restore the useful information of low-frequency words.
Our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively.
arXiv Detail & Related papers (2020-12-29T03:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.