Related papers: Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

URL: http://arxiv.org/abs/2103.13678v1
Date: Thu, 25 Mar 2021 08:57:09 GMT
Title: Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation
Authors: Shuhao Gu, Yang Feng, Wanying Xie
Abstract summary: Domain adaptation is widely used in practical applications of neural machine translation. The existing methods for domain adaptation usually suffer from catastrophic forgetting, domain divergence, and model explosion. We propose a method of "divide and conquer" which is based on the importance of neurons or parameters in the translation model.
Score: 9.403585397617865
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Domain Adaptation is widely used in practical applications of neural machine translation, which aims to achieve good performance on both the general-domain and in-domain. However, the existing methods for domain adaptation usually suffer from catastrophic forgetting, domain divergence, and model explosion. To address these three problems, we propose a method of "divide and conquer" which is based on the importance of neurons or parameters in the translation model. In our method, we first prune the model and only keep the important neurons or parameters, making them responsible for both general-domain and in-domain translation. Then we further train the pruned model supervised by the original unpruned model with the knowledge distillation method. Last we expand the model to the original size and fine-tune the added parameters for the in-domain translation. We conduct experiments on different languages and domains and the results show that our method can achieve significant improvements compared with several strong baselines.

Related papers

Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation [59.41178047749177]
We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training. We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling. We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE.
arXiv Detail & Related papers (2024-07-01T09:45:22Z)
Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on. We propose a new approach called D$3$G to learn domain-specific models. Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z)
Normalization Perturbation: A Simple Domain Generalization Method for Real-World Domain Shifts [133.99270341855728]
Real-world domain styles can vary substantially due to environment changes and sensor noises. Deep models only know the training domain style. We propose Normalization Perturbation to overcome this domain style overfitting problem.
arXiv Detail & Related papers (2022-11-08T17:36:49Z)
Understanding Domain Learning in Language Models Through Subpopulation Analysis [35.16003054930906]
We investigate how different domains are encoded in modern neural network architectures. We analyze the relationship between natural language domains, model size, and the amount of training data used.
arXiv Detail & Related papers (2022-10-22T21:12:57Z)
QAGAN: Adversarial Approach To Learning Domain Invariant Language Features [0.76146285961466]
We explore adversarial training approach towards learning domain-invariant features. We are able to achieve $15.2%$ improvement in EM score and $5.6%$ boost in F1 score on out-of-domain validation dataset.
arXiv Detail & Related papers (2022-06-24T17:42:18Z)
Efficient Machine Translation Domain Adaptation [7.747003493657217]
Machine translation models struggle when translating out-of-domain text. domain adaptation methods focus on fine-tuning or training the entire or part of the model on every new domain. We introduce a simple but effective caching strategy that avoids performing retrieval when similar contexts have been seen before.
arXiv Detail & Related papers (2022-04-26T21:47:54Z)
Efficient Hierarchical Domain Adaptation for Pretrained Language Models [77.02962815423658]
Generative language models are trained on diverse, general domain corpora. We introduce a method to scale domain adaptation to many diverse domains using a computationally efficient adapter approach.
arXiv Detail & Related papers (2021-12-16T11:09:29Z)
Iterative Domain-Repaired Back-Translation [50.32925322697343]
In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent. We propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair model to refine translations in synthetic bilingual data. Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2020-10-06T04:38:09Z)
A Simple Baseline to Semi-Supervised Domain Adaptation for Machine Translation [73.3550140511458]
State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on new domains with no supervised data. We propose a simple but effect approach to the semi-supervised domain adaptation scenario of NMT. This approach iteratively trains a Transformer-based NMT model via three training objectives: language modeling, back-translation, and supervised translation.
arXiv Detail & Related papers (2020-01-22T16:42:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.