Progressive Multi-Granularity Training for Non-Autoregressive
Translation
- URL: http://arxiv.org/abs/2106.05546v2
- Date: Fri, 11 Jun 2021 07:22:29 GMT
- Title: Progressive Multi-Granularity Training for Non-Autoregressive
Translation
- Authors: Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, Dacheng Tao,
Zhaopeng Tu
- Abstract summary: Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence.
Recent studies show that NAT is weak at learning high-mode of knowledge such as one-to-many translations.
We argue that modes can be divided into various granularities which can be learned from easy to hard.
- Score: 98.11249019844281
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Non-autoregressive translation (NAT) significantly accelerates the inference
process via predicting the entire target sequence. However, recent studies show
that NAT is weak at learning high-mode of knowledge such as one-to-many
translations. We argue that modes can be divided into various granularities
which can be learned from easy to hard. In this study, we empirically show that
NAT models are prone to learn fine-grained lower-mode knowledge, such as words
and phrases, compared with sentences. Based on this observation, we propose
progressive multi-granularity training for NAT. More specifically, to make the
most of the training data, we break down the sentence-level examples into three
types, i.e. words, phrases, sentences, and with the training goes, we
progressively increase the granularities. Experiments on Romanian-English,
English-German, Chinese-English, and Japanese-English demonstrate that our
approach improves the phrase translation accuracy and model reordering ability,
therefore resulting in better translation quality against strong NAT baselines.
Also, we show that more deterministic fine-grained knowledge can further
enhance performance.
Related papers
- Selective Knowledge Distillation for Non-Autoregressive Neural Machine
Translation [34.22251326493591]
The Non-Autoregressive Transformer (NAT) achieves great success in neural machine translation tasks.
Existing knowledge distillation has side effects, such as propagating errors from the teacher to NAT students.
We introduce selective knowledge distillation by introducing an NAT to select NAT-friendly targets that are of high quality and easy to learn.
arXiv Detail & Related papers (2023-03-31T09:16:13Z) - Candidate Soups: Fusing Candidate Results Improves Translation Quality
for Non-Autoregressive Translation [15.332496335303189]
Non-autoregressive translation (NAT) model achieves a much faster inference speed than the autoregressive translation (AT) model.
Existing NAT methods only focus on improving the NAT model's performance but do not fully utilize it.
We propose a simple but effective method called "Candidate Soups," which can obtain high-quality translations.
arXiv Detail & Related papers (2023-01-27T02:39:42Z) - Rephrasing the Reference for Non-Autoregressive Machine Translation [37.816198073720614]
Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem that there may exist multiple possible translations of a source sentence.
We introduce a rephraser to provide a better training target for NAT by rephrasing the reference sentence according to the NAT output.
Our best variant achieves comparable performance to the autoregressive Transformer, while being 14.7 times more efficient in inference.
arXiv Detail & Related papers (2022-11-30T10:05:03Z) - A Survey on Non-Autoregressive Generation for Neural Machine Translation
and Beyond [145.43029264191543]
Non-autoregressive (NAR) generation is first proposed in machine translation (NMT) to speed up inference.
While NAR generation can significantly accelerate machine translation, the inference of autoregressive (AR) generation sacrificed translation accuracy.
Many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation.
arXiv Detail & Related papers (2022-04-20T07:25:22Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - Sequence-Level Training for Non-Autoregressive Neural Machine
Translation [33.17341980163439]
Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup.
We propose using sequence-level training objectives to train NAT models, which evaluate the NAT outputs as a whole and correlates well with the real translation quality.
arXiv Detail & Related papers (2021-06-15T13:30:09Z) - Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z) - Token-wise Curriculum Learning for Neural Machine Translation [94.93133801641707]
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sufficient sampling amounts of "easy" samples from training data at the early training stage.
We propose a novel token-wise curriculum learning approach that creates sufficient amounts of easy samples.
Our approach can consistently outperform baselines on 5 language pairs, especially for low-resource languages.
arXiv Detail & Related papers (2021-03-20T03:57:59Z) - Understanding and Improving Lexical Choice in Non-Autoregressive
Translation [98.11249019844281]
We propose to expose the raw data to NAT models to restore the useful information of low-frequency words.
Our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively.
arXiv Detail & Related papers (2020-12-29T03:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.