Building a Multi-domain Neural Machine Translation Model using Knowledge
Distillation
- URL: http://arxiv.org/abs/2004.07324v1
- Date: Wed, 15 Apr 2020 20:21:19 GMT
- Title: Building a Multi-domain Neural Machine Translation Model using Knowledge
Distillation
- Authors: Idriss Mghabbar, Pirashanth Ratnamogan
- Abstract summary: Lack of specialized data makes building a multi-domain neural machine translation tool challenging.
We propose a new training pipeline where knowledge distillation and multiple specialized teachers allow us to efficiently finetune a model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lack of specialized data makes building a multi-domain neural machine
translation tool challenging. Although emerging literature dealing with low
resource languages starts to show promising results, most state-of-the-art
models used millions of sentences. Today, the majority of multi-domain
adaptation techniques are based on complex and sophisticated architectures that
are not adapted for real-world applications. So far, no scalable method is
performing better than the simple yet effective mixed-finetuning, i.e
finetuning a generic model with a mix of all specialized data and generic data.
In this paper, we propose a new training pipeline where knowledge distillation
and multiple specialized teachers allow us to efficiently finetune a model
without adding new costs at inference time. Our experiments demonstrated that
our training pipeline allows improving the performance of multi-domain
translation over finetuning in configurations with 2, 3, and 4 domains by up to
2 points in BLEU.
Related papers
- Cross-Domain Content Generation with Domain-Specific Small Language Models [3.2772349789781616]
This study explores methods to enable a small language model to produce coherent and relevant outputs for two different domains.
We find that utilizing custom tokenizers tailored to each dataset significantly enhances generation quality.
Our findings demonstrate that knowledge expansion with frozen layers is an effective method for small language models to generate domain-specific content.
arXiv Detail & Related papers (2024-09-19T21:45:13Z) - Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models [93.92762966380793]
Large language models (LLMs) strive to achieve high performance across all three domains simultaneously.
In this paper, we propose to fuse models that are already highly-specialized directly.
The proposed fusing framework, UltraFuser, consists of three distinct specialists that are already sufficiently trained on language, coding, and mathematics.
arXiv Detail & Related papers (2024-03-13T06:18:48Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Domain-Specific Text Generation for Machine Translation [7.803471587734353]
We propose a novel approach to domain adaptation leveraging state-of-the-art pretrained language models (LMs) for domain-specific data augmentation.
We employ mixed fine-tuning to train models that significantly improve translation of in-domain texts.
arXiv Detail & Related papers (2022-08-11T16:22:16Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - Improving Neural Machine Translation by Denoising Training [95.96569884410137]
We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation.
We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally.
Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
arXiv Detail & Related papers (2022-01-19T00:11:38Z) - Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural
Machine Translation Training [58.72619374790418]
MultiUAT dynamically adjusts the training data usage based on the model's uncertainty.
We analyze the cross-domain transfer and show the deficiency of static and similarity based methods.
arXiv Detail & Related papers (2021-09-06T08:30:33Z) - Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining [37.2106265998237]
We propose an effective learning procedure named Meta Fine-Tuning (MFT)
MFT serves as a meta-learner to solve a group of similar NLP tasks for neural language models.
We implement MFT upon BERT to solve several multi-domain text mining tasks.
arXiv Detail & Related papers (2020-03-29T11:27:10Z) - Distill, Adapt, Distill: Training Small, In-Domain Models for Neural
Machine Translation [12.949219829789874]
We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation.
Our large-scale empirical results in machine translation suggest distilling twice for best performance.
arXiv Detail & Related papers (2020-03-05T19:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.