Faster Nearest Neighbor Machine Translation
- URL: http://arxiv.org/abs/2112.08152v1
- Date: Wed, 15 Dec 2021 14:21:26 GMT
- Title: Faster Nearest Neighbor Machine Translation
- Authors: Shuhe Wang, Jiwei Li, Yuxian Meng, Rongbin Ouyang, Guoyin Wang, Xiaoya
Li, Tianwei Zhang, Shi Zong
- Abstract summary: $k$NN based neural machine translation ($k$NN-MT) has achieved state-of-the-art results in a variety of MT tasks.
One significant shortcoming of $k$NN-MT lies in its inefficiency in identifying the $k$ nearest neighbors of the query representation from the entire datastore.
We propose textbfFaster $k$NN-MT to address this issue.
- Score: 27.38186214015994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: $k$NN based neural machine translation ($k$NN-MT) has achieved
state-of-the-art results in a variety of MT tasks. One significant shortcoming
of $k$NN-MT lies in its inefficiency in identifying the $k$ nearest neighbors
of the query representation from the entire datastore, which is prohibitively
time-intensive when the datastore size is large. In this work, we propose
\textbf{Faster $k$NN-MT} to address this issue. The core idea of Faster
$k$NN-MT is to use a hierarchical clustering strategy to approximate the
distance between the query and a data point in the datastore, which is
decomposed into two parts: the distance between the query and the center of the
cluster that the data point belongs to, and the distance between the data point
and the cluster center. We propose practical ways to compute these two parts in
a significantly faster manner. Through extensive experiments on different MT
benchmarks, we show that \textbf{Faster $k$NN-MT} is faster than Fast $k$NN-MT
\citep{meng2021fast} and only slightly (1.2 times) slower than its vanilla
counterpart while preserving model performance as $k$NN-MT. Faster $k$NN-MT
enables the deployment of $k$NN-MT models on real-world MT services.
Related papers
- Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval [49.825549809652436]
$k$NN-MT constructs an external datastore to store domain-specific translation knowledge.
adaptive retrieval ($k$NN-MT-AR) dynamically estimates $lambda$ and skips $k$NN retrieval if $lambda$ is less than a fixed threshold.
We propose dynamic retrieval ($k$NN-MT-DR) that significantly extends vanilla $k$NN-MT in two aspects.
arXiv Detail & Related papers (2024-06-10T07:36:55Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - INK: Injecting kNN Knowledge in Nearest Neighbor Machine Translation [57.952478914459164]
kNN-MT has provided an effective paradigm to smooth the prediction based on neighbor representations during inference.
We propose an effective training framework INK to directly smooth the representation space via adjusting representations of kNN neighbors with a small number of new parameters.
Experiments on four benchmark datasets show that method achieves average gains of 1.99 COMET and 1.0 BLEU, outperforming the state-of-the-art kNN-MT system with 0.02x memory space and 1.9x inference speedup.
arXiv Detail & Related papers (2023-06-10T08:39:16Z) - Simple and Scalable Nearest Neighbor Machine Translation [11.996135740547897]
$k$NN-MT is a powerful approach for fast domain adaptation.
We propose a simple and scalable nearest neighbor machine translation framework.
Our proposed approach achieves almost 90% speed as the NMT model without performance degradation.
arXiv Detail & Related papers (2023-02-23T17:28:29Z) - Better Datastore, Better Translation: Generating Datastores from
Pre-Trained Models for Nearest Neural Machine Translation [48.58899349349702]
Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism.
In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT.
arXiv Detail & Related papers (2022-12-17T08:34:20Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Chunk-based Nearest Neighbor Machine Translation [7.747003493657217]
We introduce a textitchunk-based $k$NN-MT model which retrieves chunks of tokens from the datastore, instead of a single token.
Experiments on machine translation in two settings, static domain adaptation and on-the-fly'' adaptation, show that the chunk-based model leads to a significant speed-up (up to 4 times) with only a small drop in translation quality.
arXiv Detail & Related papers (2022-05-24T17:39:25Z) - Efficient Cluster-Based k-Nearest-Neighbor Machine Translation [65.69742565855395]
k-Nearest-Neighbor Machine Translation (kNN-MT) has been recently proposed as a non-parametric solution for domain adaptation in neural machine translation (NMT)
arXiv Detail & Related papers (2022-04-13T05:46:31Z) - Fast Nearest Neighbor Machine Translation [30.242943649240328]
$k$NN-MT uses the entire reference corpus as the datastore for the nearest neighbor search.
Fast $k$NN-MT constructs a significantly smaller datastore for the nearest neighbor search.
Fast $k$NN-MT is two-order faster than $k$NN-MT, and is only two times slower than the standard NMT model.
arXiv Detail & Related papers (2021-05-30T13:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.