Frequency-Aware Contrastive Learning for Neural Machine Translation
- URL: http://arxiv.org/abs/2112.14484v1
- Date: Wed, 29 Dec 2021 10:10:10 GMT
- Title: Frequency-Aware Contrastive Learning for Neural Machine Translation
- Authors: Tong Zhang, Wei Ye, Baosong Yang, Long Zhang, Xingzhang Ren, Dayiheng
Liu, Jinan Sun, Shikun Zhang, Haibo Zhang, Wen Zhao
- Abstract summary: Low-frequency word prediction remains a challenge in modern neural machine translation (NMT) systems.
Inspired by the observation that low-frequency words form a more compact embedding space, we tackle this challenge from a representation learning perspective.
We propose a frequency-aware token-level contrastive learning method, in which the hidden state of each decoding step is pushed away from the counterparts of other target words.
- Score: 24.336356651877388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-frequency word prediction remains a challenge in modern neural machine
translation (NMT) systems. Recent adaptive training methods promote the output
of infrequent words by emphasizing their weights in the overall training
objectives. Despite the improved recall of low-frequency words, their
prediction precision is unexpectedly hindered by the adaptive objectives.
Inspired by the observation that low-frequency words form a more compact
embedding space, we tackle this challenge from a representation learning
perspective. Specifically, we propose a frequency-aware token-level contrastive
learning method, in which the hidden state of each decoding step is pushed away
from the counterparts of other target words, in a soft contrastive way based on
the corresponding word frequencies. We conduct experiments on widely used NIST
Chinese-English and WMT14 English-German translation tasks. Empirical results
show that our proposed methods can not only significantly improve the
translation quality but also enhance lexical diversity and optimize word
representation space. Further investigation reveals that, comparing with
related adaptive training strategies, the superiority of our method on
low-frequency word prediction lies in the robustness of token-level recall
across different frequencies without sacrificing precision.
Related papers
- An Analysis of BPE Vocabulary Trimming in Neural Machine Translation [56.383793805299234]
vocabulary trimming is a postprocessing step that replaces rare subwords with their component subwords.
We show that vocabulary trimming fails to improve performance and is even prone to incurring heavy degradation.
arXiv Detail & Related papers (2024-03-30T15:29:49Z) - Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End
Speech Recognition [21.61242091927018]
Out-Of-Vocabulary words, such as trending words and new named entities, pose problems to modern ASR systems.
We propose to generate OOV words using text-to-speech systems and to rescale losses to encourage neural networks to pay more attention to OOV words.
arXiv Detail & Related papers (2023-02-20T02:21:30Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z) - On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens.
We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation.
We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z) - Token-level Adaptive Training for Neural Machine Translation [84.69646428587548]
There exists a token imbalance phenomenon in natural language as different tokens appear with different frequencies.
vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies.
Low-frequency tokens may carry critical semantic information that will affect the translation quality once they are neglected.
arXiv Detail & Related papers (2020-10-09T05:55:05Z) - Measuring Memorization Effect in Word-Level Neural Networks Probing [0.9156064716689833]
We propose a simple general method for measuring the memorization effect, based on a symmetric selection of test words seen versus unseen in training.
Our method can be used to explicitly quantify the amount of memorization happening in a probing setup, so that an adequate setup can be chosen and the results of the probing can be interpreted with a reliability estimate.
arXiv Detail & Related papers (2020-06-29T14:35:42Z) - Robust Unsupervised Neural Machine Translation with Adversarial
Denoising Training [66.39561682517741]
Unsupervised neural machine translation (UNMT) has attracted great interest in the machine translation community.
The main advantage of the UNMT lies in its easy collection of required large training text sentences.
In this paper, we first time explicitly take the noisy data into consideration to improve the robustness of the UNMT based systems.
arXiv Detail & Related papers (2020-02-28T05:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.