Nearest Neighbor Machine Translation is Meta-Optimizer on Output
Projection Layer
- URL: http://arxiv.org/abs/2305.13034v2
- Date: Tue, 24 Oct 2023 10:22:05 GMT
- Title: Nearest Neighbor Machine Translation is Meta-Optimizer on Output
Projection Layer
- Authors: Ruize Gao, Zhirui Zhang, Yichao Du, Lemao Liu, Rui Wang
- Abstract summary: Nearest Neighbor Machine Translation ($k$NN-MT) has achieved great success in domain adaptation tasks.
We comprehensively analyze $k$NN-MT through theoretical and empirical studies.
- Score: 44.02848852485475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nearest Neighbor Machine Translation ($k$NN-MT) has achieved great success in
domain adaptation tasks by integrating pre-trained Neural Machine Translation
(NMT) models with domain-specific token-level retrieval. However, the reasons
underlying its success have not been thoroughly investigated. In this paper, we
comprehensively analyze $k$NN-MT through theoretical and empirical studies.
Initially, we provide new insights into the working mechanism of $k$NN-MT as an
efficient technique to implicitly execute gradient descent on the output
projection layer of NMT, indicating that it is a specific case of model
fine-tuning. Subsequently, we conduct multi-domain experiments and word-level
analysis to examine the differences in performance between $k$NN-MT and
entire-model fine-tuning. Our findings suggest that: (1) Incorporating $k$NN-MT
with adapters yields comparable translation performance to fine-tuning on
in-domain test sets, while achieving better performance on out-of-domain test
sets; (2) Fine-tuning significantly outperforms $k$NN-MT on the recall of
in-domain low-frequency words, but this gap could be bridged by optimizing the
context representations with additional adapter layers.
Related papers
- Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval [49.825549809652436]
$k$NN-MT constructs an external datastore to store domain-specific translation knowledge.
adaptive retrieval ($k$NN-MT-AR) dynamically estimates $lambda$ and skips $k$NN retrieval if $lambda$ is less than a fixed threshold.
We propose dynamic retrieval ($k$NN-MT-DR) that significantly extends vanilla $k$NN-MT in two aspects.
arXiv Detail & Related papers (2024-06-10T07:36:55Z) - Towards Reliable Neural Machine Translation with Consistency-Aware
Meta-Learning [24.64700139151659]
Current Neural machine translation (NMT) systems suffer from a lack of reliability.
We present a consistency-aware meta-learning (CAML) framework derived from the model-agnostic meta-learning (MAML) algorithm to address it.
We conduct experiments on the NIST Chinese to English task, three WMT translation tasks, and the TED M2O task.
arXiv Detail & Related papers (2023-03-20T09:41:28Z) - Simple and Scalable Nearest Neighbor Machine Translation [11.996135740547897]
$k$NN-MT is a powerful approach for fast domain adaptation.
We propose a simple and scalable nearest neighbor machine translation framework.
Our proposed approach achieves almost 90% speed as the NMT model without performance degradation.
arXiv Detail & Related papers (2023-02-23T17:28:29Z) - DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine
Translation [10.03007605098947]
Domain Adaptation (DA) of Neural Machine Translation (NMT) model often relies on a pre-trained general NMT model which is adapted to the new domain on a sample of in-domain parallel data.
We propose a Domain Learning Curve prediction (DaLC) model that predicts prospective DA performance based on in-domain monolingual samples in the source language.
arXiv Detail & Related papers (2022-04-20T06:57:48Z) - Efficient Cluster-Based k-Nearest-Neighbor Machine Translation [65.69742565855395]
k-Nearest-Neighbor Machine Translation (kNN-MT) has been recently proposed as a non-parametric solution for domain adaptation in neural machine translation (NMT)
arXiv Detail & Related papers (2022-04-13T05:46:31Z) - Non-Parametric Unsupervised Domain Adaptation for Neural Machine
Translation [61.27321597981737]
$k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor retrieval.
We propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval.
arXiv Detail & Related papers (2021-09-14T11:50:01Z) - Understanding Learning Dynamics for Neural Machine Translation [53.23463279153577]
We propose to understand learning dynamics of NMT by using Loss Change Allocation (LCA)citeplan 2019-loss-change-allocation.
As LCA requires calculating the gradient on an entire dataset for each update, we instead present an approximate to put it into practice in NMT scenario.
Our simulated experiment shows that such approximate calculation is efficient and is empirically proved to deliver consistent results.
arXiv Detail & Related papers (2020-04-05T13:32:58Z) - A Simple Baseline to Semi-Supervised Domain Adaptation for Machine
Translation [73.3550140511458]
State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on new domains with no supervised data.
We propose a simple but effect approach to the semi-supervised domain adaptation scenario of NMT.
This approach iteratively trains a Transformer-based NMT model via three training objectives: language modeling, back-translation, and supervised translation.
arXiv Detail & Related papers (2020-01-22T16:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.