Distill, Adapt, Distill: Training Small, In-Domain Models for Neural
Machine Translation
- URL: http://arxiv.org/abs/2003.02877v3
- Date: Tue, 23 Jun 2020 17:21:56 GMT
- Title: Distill, Adapt, Distill: Training Small, In-Domain Models for Neural
Machine Translation
- Authors: Mitchell A. Gordon, Kevin Duh
- Abstract summary: We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation.
Our large-scale empirical results in machine translation suggest distilling twice for best performance.
- Score: 12.949219829789874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore best practices for training small, memory efficient machine
translation models with sequence-level knowledge distillation in the domain
adaptation setting. While both domain adaptation and knowledge distillation are
widely-used, their interaction remains little understood. Our large-scale
empirical results in machine translation (on three language pairs with three
domains each) suggest distilling twice for best performance: once using
general-domain data and again using in-domain data with an adapted teacher.
Related papers
- Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation [59.41178047749177]
We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training.
We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling.
We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE.
arXiv Detail & Related papers (2024-07-01T09:45:22Z) - Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on.
We propose a new approach called D$3$G to learn domain-specific models.
Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z) - $m^4Adapter$: Multilingual Multi-Domain Adaptation for Machine
Translation with a Meta-Adapter [128.69723410769586]
Multilingual neural machine translation models (MNMT) yield state-of-the-art performance when evaluated on data from a domain and language pair.
When a MNMT model is used to translate under domain shift or to a new language pair, performance drops dramatically.
We propose $m4Adapter$, which combines domain and language knowledge using meta-learning with adapters.
arXiv Detail & Related papers (2022-10-21T12:25:05Z) - Finding the Right Recipe for Low Resource Domain Adaptation in Neural
Machine Translation [7.2283509416724465]
General translation models often struggle to generate accurate translations in specialized domains.
We conduct an in-depth empirical exploration of monolingual and parallel data approaches to domain adaptation.
Our work includes three domains: consumer electronic, clinical, and biomedical.
arXiv Detail & Related papers (2022-06-02T16:38:33Z) - Improving both domain robustness and domain adaptability in machine
translation [69.15496930090403]
We address two problems of domain adaptation in neural machine translation.
First, we want to reach domain robustness, i.e., good quality of both domains from the training data.
Second, we want our systems to be adaptive, i.e., making it possible to finetune systems with just hundreds of in-domain parallel sentences.
arXiv Detail & Related papers (2021-12-15T17:34:59Z) - Non-Parametric Unsupervised Domain Adaptation for Neural Machine
Translation [61.27321597981737]
$k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor retrieval.
We propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval.
arXiv Detail & Related papers (2021-09-14T11:50:01Z) - Domain Adaptation and Multi-Domain Adaptation for Neural Machine
Translation: A Survey [9.645196221785694]
We focus on robust approaches to domain adaptation for Neural Machine Translation (NMT) models.
In particular, we look at the case where a system may need to translate sentences from multiple domains.
We highlight the benefits of domain adaptation and multi-domain adaptation techniques to other lines of NMT research.
arXiv Detail & Related papers (2021-04-14T16:21:37Z) - Pruning-then-Expanding Model for Domain Adaptation of Neural Machine
Translation [9.403585397617865]
Domain adaptation is widely used in practical applications of neural machine translation.
The existing methods for domain adaptation usually suffer from catastrophic forgetting, domain divergence, and model explosion.
We propose a method of "divide and conquer" which is based on the importance of neurons or parameters in the translation model.
arXiv Detail & Related papers (2021-03-25T08:57:09Z) - Unsupervised Neural Machine Translation for Low-Resource Domains via
Meta-Learning [27.86606560170401]
We present a novel meta-learning algorithm for unsupervised neural machine translation (UNMT)
We train the model to adapt to another domain by utilizing only a small amount of training data.
Our model surpasses a transfer learning-based approach by up to 2-4 BLEU scores.
arXiv Detail & Related papers (2020-10-18T17:54:13Z) - Building a Multi-domain Neural Machine Translation Model using Knowledge
Distillation [0.0]
Lack of specialized data makes building a multi-domain neural machine translation tool challenging.
We propose a new training pipeline where knowledge distillation and multiple specialized teachers allow us to efficiently finetune a model.
arXiv Detail & Related papers (2020-04-15T20:21:19Z) - A Simple Baseline to Semi-Supervised Domain Adaptation for Machine
Translation [73.3550140511458]
State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on new domains with no supervised data.
We propose a simple but effect approach to the semi-supervised domain adaptation scenario of NMT.
This approach iteratively trains a Transformer-based NMT model via three training objectives: language modeling, back-translation, and supervised translation.
arXiv Detail & Related papers (2020-01-22T16:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.