Learning Kernel-Smoothed Machine Translation with Retrieved Examples
- URL: http://arxiv.org/abs/2109.09991v1
- Date: Tue, 21 Sep 2021 06:42:53 GMT
- Title: Learning Kernel-Smoothed Machine Translation with Retrieved Examples
- Authors: Qingnan Jiang, Mingxuan Wang, Jun Cao, Shanbo Cheng, Shujian Huang and
Lei Li
- Abstract summary: Existing non-parametric approaches that retrieve similar examples from a database to guide the translation process are promising but are prone to overfit the retrieved examples.
We propose to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online.
- Score: 30.17061384497846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to effectively adapt neural machine translation (NMT) models according to
emerging cases without retraining? Despite the great success of neural machine
translation, updating the deployed models online remains a challenge. Existing
non-parametric approaches that retrieve similar examples from a database to
guide the translation process are promising but are prone to overfit the
retrieved examples. However, non-parametric methods are prone to overfit the
retrieved examples. In this work, we propose to learn Kernel-Smoothed
Translation with Example Retrieval (KSTER), an effective approach to adapt
neural machine translation models online. Experiments on domain adaptation and
multi-domain machine translation datasets show that even without expensive
retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over
the best existing online adaptation methods. The code and trained models are
released at https://github.com/jiangqn/KSTER.
Related papers
- Can the Variation of Model Weights be used as a Criterion for Self-Paced Multilingual NMT? [7.330978520551704]
Many-to-one neural machine translation systems improve over one-to-one systems when training data is scarce.
In this paper, we design and test a novel algorithm for selecting the language of minibatches when training such systems.
arXiv Detail & Related papers (2024-10-05T12:52:51Z) - End-to-End Training for Back-Translation with Categorical Reparameterization Trick [0.0]
Back-translation is an effective semi-supervised learning framework in neural machine translation (NMT)
A pre-trained NMT model translates monolingual sentences and makes synthetic bilingual sentence pairs for the training of the other NMT model.
The discrete property of translated sentences prevents information gradient from flowing between the two NMT models.
arXiv Detail & Related papers (2022-02-17T06:31:03Z) - Improving Neural Machine Translation by Denoising Training [95.96569884410137]
We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation.
We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally.
Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
arXiv Detail & Related papers (2022-01-19T00:11:38Z) - Non-Parametric Online Learning from Human Feedback for Neural Machine
Translation [54.96594148572804]
We study the problem of online learning with human feedback in the human-in-the-loop machine translation.
Previous methods require online model updating or additional translation memory networks to achieve high-quality performance.
We propose a novel non-parametric online learning method without changing the model structure.
arXiv Detail & Related papers (2021-09-23T04:26:15Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - Meta Back-translation [111.87397401837286]
We propose a novel method to generate pseudo-parallel data from a pre-trained back-translation model.
Our method is a meta-learning algorithm which adapts a pre-trained back-translation model so that the pseudo-parallel data it generates would train a forward-translation model to do well on a validation set.
arXiv Detail & Related papers (2021-02-15T20:58:32Z) - Data Rejuvenation: Exploiting Inactive Training Examples for Neural
Machine Translation [86.40610684026262]
In this work, we explore to identify the inactive training examples which contribute less to the model performance.
We introduce data rejuvenation to improve the training of NMT models on large-scale datasets by exploiting inactive examples.
Experimental results on WMT14 English-German and English-French datasets show that the proposed data rejuvenation consistently and significantly improves performance for several strong NMT models.
arXiv Detail & Related papers (2020-10-06T08:57:31Z) - Enhanced back-translation for low resource neural machine translation
using self-training [0.0]
This work proposes a self-training strategy where the output of the backward model is used to improve the model itself through the forward translation technique.
The technique was shown to improve baseline low resource IWSLT'14 English-German and IWSLT'15 English-Vietnamese backward translation models by 11.06 and 1.5 BLEUs respectively.
The synthetic data generated by the improved English-German backward model was used to train a forward model which out-performed another forward model trained using standard back-translation by 2.7 BLEU.
arXiv Detail & Related papers (2020-06-04T14:19:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.