N-Gram Nearest Neighbor Machine Translation
- URL: http://arxiv.org/abs/2301.12866v1
- Date: Mon, 30 Jan 2023 13:19:19 GMT
- Title: N-Gram Nearest Neighbor Machine Translation
- Authors: Rui Lv, Junliang Guo, Rui Wang, Xu Tan, Qi Liu, Tao Qin
- Abstract summary: We propose a novel $n$-gram nearest neighbor retrieval method that is model agnostic and applicable to both Autoregressive Translation(AT) and Non-Autoregressive Translation(NAT) models.
We demonstrate that the proposed method consistently outperforms the token-level method on both AT and NAT models as well as on general as on domain adaptation translation tasks.
- Score: 101.25243884801183
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Nearest neighbor machine translation augments the Autoregressive
Translation~(AT) with $k$-nearest-neighbor retrieval, by comparing the
similarity between the token-level context representations of the target tokens
in the query and the datastore. However, the token-level representation may
introduce noise when translating ambiguous words, or fail to provide accurate
retrieval results when the representation generated by the model contains
indistinguishable context information, e.g., Non-Autoregressive
Translation~(NAT) models. In this paper, we propose a novel $n$-gram nearest
neighbor retrieval method that is model agnostic and applicable to both AT and
NAT models. Specifically, we concatenate the adjacent $n$-gram hidden
representations as the key, while the tuple of corresponding target tokens is
the value. In inference, we propose tailored decoding algorithms for AT and NAT
models respectively. We demonstrate that the proposed method consistently
outperforms the token-level method on both AT and NAT models as well on general
as on domain adaptation translation tasks. On domain adaptation, the proposed
method brings $1.03$ and $2.76$ improvements regarding the average BLEU score
on AT and NAT models respectively.
Related papers
- Chunk-based Nearest Neighbor Machine Translation [7.747003493657217]
We introduce a textitchunk-based $k$NN-MT model which retrieves chunks of tokens from the datastore, instead of a single token.
Experiments on machine translation in two settings, static domain adaptation and on-the-fly'' adaptation, show that the chunk-based model leads to a significant speed-up (up to 4 times) with only a small drop in translation quality.
arXiv Detail & Related papers (2022-05-24T17:39:25Z) - Non-Parametric Unsupervised Domain Adaptation for Neural Machine
Translation [61.27321597981737]
$k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor retrieval.
We propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval.
arXiv Detail & Related papers (2021-09-14T11:50:01Z) - MvSR-NAT: Multi-view Subset Regularization for Non-Autoregressive
Machine Translation [0.5586191108738562]
Conditional masked language models (CMLM) have shown impressive progress in non-autoregressive machine translation (NAT)
We introduce Multi-view Subset Regularization (MvSR), a novel regularization method to improve the performance of the NAT model.
We achieve remarkable performance on three public benchmarks with 0.36-1.14 BLEU gains over previous NAT models.
arXiv Detail & Related papers (2021-08-19T02:30:38Z) - Adaptive Nearest Neighbor Machine Translation [60.97183408140499]
kNN-MT combines pre-trained neural machine translation with token-level k-nearest-neighbor retrieval.
Traditional kNN algorithm simply retrieves a same number of nearest neighbors for each target token.
We propose Adaptive kNN-MT to dynamically determine the number of k for each target token.
arXiv Detail & Related papers (2021-05-27T09:27:42Z) - Nearest Neighbor Machine Translation [113.96357168879548]
We introduce $k$-nearest-neighbor machine translation ($k$NN-MT)
It predicts tokens with a nearest neighbor classifier over a large datastore of cached examples.
It consistently improves performance across many settings.
arXiv Detail & Related papers (2020-10-01T22:24:46Z) - Task-Level Curriculum Learning for Non-Autoregressive Neural Machine
Translation [188.3605563567253]
Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive translation (AT)
We introduce semi-autoregressive translation (SAT) as intermediate tasks. SAT covers AT and NAT as its special cases.
We design curriculum schedules to gradually shift k from 1 to N, with different pacing functions and number of tasks trained at the same time.
Experiments on IWSLT14 De-En, IWSLT16 En-De, WMT14 En-De and De-En datasets show that TCL-NAT achieves significant accuracy improvements over previous NAT baseline
arXiv Detail & Related papers (2020-07-17T06:06:54Z) - LAVA NAT: A Non-Autoregressive Translation Model with Look-Around
Decoding and Vocabulary Attention [54.18121922040521]
Non-autoregressive translation (NAT) models generate multiple tokens in one forward pass.
These NAT models often suffer from the multimodality problem, generating duplicated tokens or missing tokens.
We propose two novel methods to address this issue, the Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism.
arXiv Detail & Related papers (2020-02-08T04:11:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.