Exploring Monolingual Data for Neural Machine Translation with Knowledge
Distillation
- URL: http://arxiv.org/abs/2012.15455v1
- Date: Thu, 31 Dec 2020 05:28:42 GMT
- Title: Exploring Monolingual Data for Neural Machine Translation with Knowledge
Distillation
- Authors: Alham Fikri Aji, Kenneth Heafield
- Abstract summary: We explore two types of monolingual data that can be included in knowledge distillation training for neural machine translation (NMT)
We find that source-side monolingual data improves model performance when evaluated by test-set originated from source-side.
We also show that it is not required to train the student model with the same data used by the teacher, as long as the domains are the same.
- Score: 10.745228927771915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore two types of monolingual data that can be included in knowledge
distillation training for neural machine translation (NMT). The first is the
source-side monolingual data. Second, is the target-side monolingual data that
is used as back-translation data. Both datasets are (forward-)translated by a
teacher model from source-language to target-language, which are then combined
into a dataset for smaller student models. We find that source-side monolingual
data improves model performance when evaluated by test-set originated from
source-side. Likewise, target-side data has a positive effect on the test-set
in the opposite direction. We also show that it is not required to train the
student model with the same data used by the teacher, as long as the domains
are the same. Finally, we find that combining source-side and target-side
yields in better performance than relying on just one side of the monolingual
data.
Related papers
- Cross-lingual Transfer or Machine Translation? On Data Augmentation for
Monolingual Semantic Textual Similarity [2.422759879602353]
Cross-lingual transfer of Wikipedia data exhibits improved performance for monolingual STS.
We find a superiority of the Wikipedia domain over the NLI domain for these languages, in contrast to prior studies that focused on NLI as training data.
arXiv Detail & Related papers (2024-03-08T12:28:15Z) - When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale [73.69252847606212]
We examine how denoising autoencoding (DAE) and backtranslation (BT) impact machine translation (MMT)
We find that monolingual data generally helps MMT, but models are surprisingly brittle to domain mismatches, especially at smaller model scales.
As scale increases, DAE transitions from underperforming the parallel-only baseline at 90M to converging with BT performance at 1.6B, and even surpassing it in low-resource.
arXiv Detail & Related papers (2023-05-23T14:48:42Z) - UM4: Unified Multilingual Multiple Teacher-Student Model for
Zero-Resource Neural Machine Translation [102.04003089261761]
Multilingual neural machine translation (MNMT) enables one-pass translation using shared semantic space for all languages.
We propose a novel method, named as Unified Multilingual Multiple teacher-student Model for NMT (UM4)
Our method unifies source-teacher, target-teacher, and pivot-teacher models to guide the student model for the zero-resource translation.
arXiv Detail & Related papers (2022-07-11T14:22:59Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Multilingual Neural Semantic Parsing for Low-Resourced Languages [1.6244541005112747]
We introduce a new multilingual semantic parsing dataset in English, Italian and Japanese.
We show that joint multilingual training with pretrained encoders substantially outperforms our baselines on the TOP dataset.
We find that a semantic trained only on English data achieves a zero-shot performance of 44.9% exact-match accuracy on Italian sentences.
arXiv Detail & Related papers (2021-06-07T09:53:02Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - A Hybrid Approach for Improved Low Resource Neural Machine Translation
using Monolingual Data [0.0]
Many language pairs are low resource, meaning the amount and/or quality of available parallel data is not sufficient to train a neural machine translation (NMT) model.
This work proposes a novel approach that enables both the backward and forward models to benefit from the monolingual target data.
arXiv Detail & Related papers (2020-11-14T22:18:45Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Leveraging Monolingual Data with Self-Supervision for Multilingual
Neural Machine Translation [54.52971020087777]
Using monolingual data significantly boosts the translation quality of low-resource languages in multilingual models.
Self-supervision improves zero-shot translation quality in multilingual models.
We get up to 33 BLEU on ro-en translation without any parallel data or back-translation.
arXiv Detail & Related papers (2020-05-11T00:20:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.