Better Datastore, Better Translation: Generating Datastores from
Pre-Trained Models for Nearest Neural Machine Translation
- URL: http://arxiv.org/abs/2212.08822v1
- Date: Sat, 17 Dec 2022 08:34:20 GMT
- Title: Better Datastore, Better Translation: Generating Datastores from
Pre-Trained Models for Nearest Neural Machine Translation
- Authors: Jiahuan Li, Shanbo Cheng, Zewei Sun, Mingxuan Wang, Shujian Huang
- Abstract summary: Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism.
In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT.
- Score: 48.58899349349702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method
of augmenting neural machine translation (NMT) with a token-level nearest
neighbor retrieval mechanism. The effectiveness of kNNMT directly depends on
the quality of retrieved neighbors. However, original kNNMT builds datastores
based on representations from NMT models, which would result in poor retrieval
accuracy when NMT models are not good enough, leading to sub-optimal
translation performance. In this paper, we propose PRED, a framework that
leverages Pre-trained models for Datastores in kNN-MT. Better representations
from pre-trained models allow us to build datastores of better quality. We also
design a novel contrastive alignment objective to mitigate the representation
gap between the NMT model and pre-trained models, enabling the NMT model to
retrieve from better datastores. We conduct extensive experiments on both
bilingual and multilingual translation benchmarks, including WMT17 English
$\leftrightarrow$ Chinese, WMT14 English $\leftrightarrow$ German, IWSLT14
German $\leftrightarrow$ English, and IWSLT14 multilingual datasets. Empirical
results demonstrate the effectiveness of PRED.
Related papers
- knn-seq: Efficient, Extensible kNN-MT Framework [11.421689052786467]
k-nearest-neighbor machine translation (kNN-MT) boosts the translation quality of a pre-trained machine translation (NMT) model by utilizing translation examples during decoding.
Due to its size, it is computationally expensive both to construct and to retrieve examples from the datastore.
We present an efficient and achieves kNN-MT framework, knn-seq, for researchers and developers that is carefully designed to run efficiently, even with a billion-scale large datastore.
arXiv Detail & Related papers (2023-10-18T21:56:04Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Towards Robust k-Nearest-Neighbor Machine Translation [72.9252395037097]
k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years.
Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model.
The underlying retrieved noisy pairs will dramatically deteriorate the model performance.
We propose a confidence-enhanced kNN-MT model with robust training to alleviate the impact of noise.
arXiv Detail & Related papers (2022-10-17T07:43:39Z) - Chunk-based Nearest Neighbor Machine Translation [7.747003493657217]
We introduce a textitchunk-based $k$NN-MT model which retrieves chunks of tokens from the datastore, instead of a single token.
Experiments on machine translation in two settings, static domain adaptation and on-the-fly'' adaptation, show that the chunk-based model leads to a significant speed-up (up to 4 times) with only a small drop in translation quality.
arXiv Detail & Related papers (2022-05-24T17:39:25Z) - End-to-End Training for Back-Translation with Categorical Reparameterization Trick [0.0]
Back-translation is an effective semi-supervised learning framework in neural machine translation (NMT)
A pre-trained NMT model translates monolingual sentences and makes synthetic bilingual sentence pairs for the training of the other NMT model.
The discrete property of translated sentences prevents information gradient from flowing between the two NMT models.
arXiv Detail & Related papers (2022-02-17T06:31:03Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Nearest Neighbor Machine Translation [113.96357168879548]
We introduce $k$-nearest-neighbor machine translation ($k$NN-MT)
It predicts tokens with a nearest neighbor classifier over a large datastore of cached examples.
It consistently improves performance across many settings.
arXiv Detail & Related papers (2020-10-01T22:24:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.