INK: Injecting kNN Knowledge in Nearest Neighbor Machine Translation
- URL: http://arxiv.org/abs/2306.06381v1
- Date: Sat, 10 Jun 2023 08:39:16 GMT
- Title: INK: Injecting kNN Knowledge in Nearest Neighbor Machine Translation
- Authors: Wenhao Zhu, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen
- Abstract summary: kNN-MT has provided an effective paradigm to smooth the prediction based on neighbor representations during inference.
We propose an effective training framework INK to directly smooth the representation space via adjusting representations of kNN neighbors with a small number of new parameters.
Experiments on four benchmark datasets show that method achieves average gains of 1.99 COMET and 1.0 BLEU, outperforming the state-of-the-art kNN-MT system with 0.02x memory space and 1.9x inference speedup.
- Score: 57.952478914459164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural machine translation has achieved promising results on many translation
tasks. However, previous studies have shown that neural models induce a
non-smooth representation space, which harms its generalization results.
Recently, kNN-MT has provided an effective paradigm to smooth the prediction
based on neighbor representations during inference. Despite promising results,
kNN-MT usually requires large inference overhead. We propose an effective
training framework INK to directly smooth the representation space via
adjusting representations of kNN neighbors with a small number of new
parameters. The new parameters are then used to refresh the whole
representation datastore to get new kNN knowledge asynchronously. This loop
keeps running until convergence. Experiments on four benchmark datasets show
that \method achieves average gains of 1.99 COMET and 1.0 BLEU, outperforming
the state-of-the-art kNN-MT system with 0.02x memory space and 1.9x inference
speedup.
Related papers
- Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval [49.825549809652436]
$k$NN-MT constructs an external datastore to store domain-specific translation knowledge.
adaptive retrieval ($k$NN-MT-AR) dynamically estimates $lambda$ and skips $k$NN retrieval if $lambda$ is less than a fixed threshold.
We propose dynamic retrieval ($k$NN-MT-DR) that significantly extends vanilla $k$NN-MT in two aspects.
arXiv Detail & Related papers (2024-06-10T07:36:55Z) - Towards Faster k-Nearest-Neighbor Machine Translation [51.866464707284635]
k-nearest-neighbor machine translation approaches suffer from heavy retrieve overhead on the entire datastore when decoding each token.
We propose a simple yet effective multi-layer perceptron (MLP) network to predict whether a token should be translated jointly by the neural machine translation model and probabilities produced by the kNN.
Our method significantly reduces the overhead of kNN retrievals by up to 53% at the expense of a slight decline in translation quality.
arXiv Detail & Related papers (2023-12-12T16:41:29Z) - SEENN: Towards Temporal Spiking Early-Exit Neural Networks [26.405775809170308]
Spiking Neural Networks (SNNs) have recently become more popular as a biologically plausible substitute for traditional Artificial Neural Networks (ANNs)
We study a fine-grained adjustment of the number of timesteps in SNNs.
By dynamically adjusting the number of timesteps, our SEENN achieves a remarkable reduction in the average number of timesteps during inference.
arXiv Detail & Related papers (2023-04-02T15:57:09Z) - Optimising Event-Driven Spiking Neural Network with Regularisation and Cutoff [31.61525648918492]
Spiking neural network (SNN) offer a closer mimicry of natural neural networks.
Current SNN is trained to infer over a fixed duration.
We propose a cutoff in SNN, which can terminate SNN anytime during inference to achieve efficient inference.
arXiv Detail & Related papers (2023-01-23T16:14:09Z) - Towards Robust k-Nearest-Neighbor Machine Translation [72.9252395037097]
k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years.
Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model.
The underlying retrieved noisy pairs will dramatically deteriorate the model performance.
We propose a confidence-enhanced kNN-MT model with robust training to alleviate the impact of noise.
arXiv Detail & Related papers (2022-10-17T07:43:39Z) - Nearest Neighbor Zero-Shot Inference [68.56747574377215]
kNN-Prompt is a technique to use k-nearest neighbor (kNN) retrieval augmentation for zero-shot inference with language models (LMs)
fuzzy verbalizers leverage the sparse kNN distribution for downstream tasks by automatically associating each classification label with a set of natural language tokens.
Experiments show that kNN-Prompt is effective for domain adaptation with no further training, and that the benefits of retrieval increase with the size of the model used for kNN retrieval.
arXiv Detail & Related papers (2022-05-27T07:00:59Z) - DNNR: Differential Nearest Neighbors Regression [8.667550264279166]
K-nearest neighbors (KNN) is one of the earliest and most established algorithms in machine learning.
For regression tasks, KNN averages the targets within a neighborhood which poses a number of challenges.
We propose Differential Nearest Neighbors Regression (DNNR) that addresses both issues simultaneously.
arXiv Detail & Related papers (2022-05-17T15:22:53Z) - Adaptive Nearest Neighbor Machine Translation [60.97183408140499]
kNN-MT combines pre-trained neural machine translation with token-level k-nearest-neighbor retrieval.
Traditional kNN algorithm simply retrieves a same number of nearest neighbors for each target token.
We propose Adaptive kNN-MT to dynamically determine the number of k for each target token.
arXiv Detail & Related papers (2021-05-27T09:27:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.