Continual Knowledge Distillation for Neural Machine Translation
- URL: http://arxiv.org/abs/2212.09097v2
- Date: Mon, 12 Jun 2023 12:00:20 GMT
- Title: Continual Knowledge Distillation for Neural Machine Translation
- Authors: Yuanchi Zhang, Peng Li, Maosong Sun, Yang Liu
- Abstract summary: parallel corpora are not publicly accessible for data copyright, data privacy and competitive differentiation reasons.
We propose a method called continual knowledge distillation to take advantage of existing translation models to improve one model of interest.
- Score: 74.03622486218597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While many parallel corpora are not publicly accessible for data copyright,
data privacy and competitive differentiation reasons, trained translation
models are increasingly available on open platforms. In this work, we propose a
method called continual knowledge distillation to take advantage of existing
translation models to improve one model of interest. The basic idea is to
sequentially transfer knowledge from each trained model to the distilled model.
Extensive experiments on Chinese-English and German-English datasets show that
our method achieves significant and consistent improvements over strong
baselines under both homogeneous and heterogeneous trained model settings and
is robust to malicious models.
Related papers
- Efficient Machine Translation with a BiLSTM-Attention Approach [0.0]
This paper proposes a novel Seq2Seq model aimed at improving translation quality while reducing the storage space required by the model.
The model employs a Bidirectional Long Short-Term Memory network (Bi-LSTM) as the encoder to capture the context information of the input sequence.
Compared to the current mainstream Transformer model, our model achieves superior performance on the WMT14 machine translation dataset.
arXiv Detail & Related papers (2024-10-29T01:12:50Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Efficient Speech Translation with Pre-trained Models [13.107314023500349]
We investigate efficient strategies to build cascaded and end-to-end speech translation systems based on pre-trained models.
While the end-to-end models show superior translation performance to cascaded ones, the application of this technology has a limitation on the need for additional end-to-end training data.
arXiv Detail & Related papers (2022-11-09T15:07:06Z) - Too Brittle To Touch: Comparing the Stability of Quantization and
Distillation Towards Developing Lightweight Low-Resource MT Models [12.670354498961492]
State-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages.
Knowledge Distillation is one popular technique to develop competitive, lightweight models.
arXiv Detail & Related papers (2022-10-27T05:30:13Z) - A Hybrid Approach for Improved Low Resource Neural Machine Translation
using Monolingual Data [0.0]
Many language pairs are low resource, meaning the amount and/or quality of available parallel data is not sufficient to train a neural machine translation (NMT) model.
This work proposes a novel approach that enables both the backward and forward models to benefit from the monolingual target data.
arXiv Detail & Related papers (2020-11-14T22:18:45Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.