Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation
- URL: http://arxiv.org/abs/2106.00903v1
- Date: Wed, 2 Jun 2021 02:41:40 GMT
- Title: Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation
- Authors: Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, Dacheng Tao and
Zhaopeng Tu
- Abstract summary: Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
- Score: 98.11249019844281
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Knowledge distillation (KD) is commonly used to construct synthetic data for
training non-autoregressive translation (NAT) models. However, there exists a
discrepancy on low-frequency words between the distilled and the original data,
leading to more errors on predicting low-frequency words. To alleviate the
problem, we directly expose the raw data into NAT by leveraging pretraining. By
analyzing directed alignments, we found that KD makes low-frequency source
words aligned with targets more deterministically but fails to align sufficient
low-frequency words from target to source. Accordingly, we propose reverse KD
to rejuvenate more alignments for low-frequency target words. To make the most
of authentic and synthetic data, we combine these complementary approaches as a
new training strategy for further boosting NAT performance. We conduct
experiments on five translation benchmarks over two advanced architectures.
Results demonstrate that the proposed approach can significantly and
universally improve translation quality by reducing translation errors on
low-frequency words. Encouragingly, our approach achieves 28.2 and 33.9 BLEU
points on the WMT14 English-German and WMT16 Romanian-English datasets,
respectively. Our code, data, and trained models are available at
\url{https://github.com/longyuewangdcu/RLFW-NAT}.
Related papers
- DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation [29.76274107159478]
Non-autoregressive Transformers (NATs) are applied in direct speech-to-speech translation systems.
We introduce DiffNorm, a diffusion-based normalization strategy that simplifies data distributions for training NAT models.
Our strategies result in a notable improvement of about +7 ASR-BLEU for English-Spanish (En-Es) and +2 ASR-BLEU for English-French (En-Fr) on the CVSS benchmark.
arXiv Detail & Related papers (2024-05-22T01:10:39Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - Semi-supervised Neural Machine Translation with Consistency
Regularization for Low-Resource Languages [3.475371300689165]
This paper presents a simple yet effective method to tackle the problem for low-resource languages by augmenting high-quality sentence pairs and training NMT models in a semi-supervised manner.
Specifically, our approach combines the cross-entropy loss for supervised learning with KL Divergence for unsupervised fashion given pseudo and augmented target sentences.
Experimental results show that our approach significantly improves NMT baselines, especially on low-resource datasets with 0.46--2.03 BLEU scores.
arXiv Detail & Related papers (2023-04-02T15:24:08Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - An Empirical Study of Language Model Integration for Transducer based
Speech Recognition [23.759084092602517]
Methods such as density ratio (DR) and ILM estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method.
We propose a low-order density ratio method (LODR) by training a low-order weak ILM for DR.
arXiv Detail & Related papers (2022-03-31T03:33:50Z) - HintedBT: Augmenting Back-Translation with Quality and Transliteration
Hints [7.452359972117693]
Back-translation of target monolingual corpora is a widely used data augmentation strategy for neural machine translation (NMT)
We introduce HintedBT -- a family of techniques which provides hints (through tags) to the encoder and decoder.
We show that using these hints, both separately and together, significantly improves translation quality.
arXiv Detail & Related papers (2021-09-09T17:43:20Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Meta Back-translation [111.87397401837286]
We propose a novel method to generate pseudo-parallel data from a pre-trained back-translation model.
Our method is a meta-learning algorithm which adapts a pre-trained back-translation model so that the pseudo-parallel data it generates would train a forward-translation model to do well on a validation set.
arXiv Detail & Related papers (2021-02-15T20:58:32Z) - Understanding and Improving Lexical Choice in Non-Autoregressive
Translation [98.11249019844281]
We propose to expose the raw data to NAT models to restore the useful information of low-frequency words.
Our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively.
arXiv Detail & Related papers (2020-12-29T03:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.