UM4: Unified Multilingual Multiple Teacher-Student Model for
Zero-Resource Neural Machine Translation
- URL: http://arxiv.org/abs/2207.04900v1
- Date: Mon, 11 Jul 2022 14:22:59 GMT
- Title: UM4: Unified Multilingual Multiple Teacher-Student Model for
Zero-Resource Neural Machine Translation
- Authors: Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang, Shuangzhi Wu,
Hongcheng Guo, Zhoujun Li, Furu Wei
- Abstract summary: Multilingual neural machine translation (MNMT) enables one-pass translation using shared semantic space for all languages.
We propose a novel method, named as Unified Multilingual Multiple teacher-student Model for NMT (UM4)
Our method unifies source-teacher, target-teacher, and pivot-teacher models to guide the student model for the zero-resource translation.
- Score: 102.04003089261761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most translation tasks among languages belong to the zero-resource
translation problem where parallel corpora are unavailable. Multilingual neural
machine translation (MNMT) enables one-pass translation using shared semantic
space for all languages compared to the two-pass pivot translation but often
underperforms the pivot-based method. In this paper, we propose a novel method,
named as Unified Multilingual Multiple teacher-student Model for NMT (UM4). Our
method unifies source-teacher, target-teacher, and pivot-teacher models to
guide the student model for the zero-resource translation. The source teacher
and target teacher force the student to learn the direct source to target
translation by the distilled knowledge on both source and target sides. The
monolingual corpus is further leveraged by the pivot-teacher model to enhance
the student model. Experimental results demonstrate that our model of 72
directions significantly outperforms previous methods on the WMT benchmark.
Related papers
- MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation [61.65537912700187]
Large Language Models (LLM) have demonstrated their strong ability in the field of machine translation (MT)
We propose a framework called MT-Patcher, which transfers knowledge from LLMs to existing MT models in a selective, comprehensive and proactive manner.
arXiv Detail & Related papers (2024-03-14T16:07:39Z) - Distilling Efficient Language-Specific Models for Cross-Lingual Transfer [75.32131584449786]
Massively multilingual Transformers (MMTs) are widely used for cross-lingual transfer learning.
MMTs' language coverage makes them unnecessarily expensive to deploy in terms of model size, inference time, energy, and hardware cost.
We propose to extract compressed, language-specific models from MMTs which retain the capacity of the original MMTs for cross-lingual transfer.
arXiv Detail & Related papers (2023-06-02T17:31:52Z) - Multilingual Bidirectional Unsupervised Translation Through Multilingual
Finetuning and Back-Translation [23.401781865904386]
We propose a two-stage approach for training a single NMT model to translate unseen languages both to and from English.
For the first stage, we initialize an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform multilingual fine-tuning on parallel data in 40 languages to English.
For the second stage, we leverage this generalization ability to generate synthetic parallel data from monolingual datasets, then bidirectionally train with successive rounds of back-translation.
arXiv Detail & Related papers (2022-09-06T21:20:41Z) - Cross-Lingual Text Classification with Multilingual Distillation and
Zero-Shot-Aware Training [21.934439663979663]
Multi-branch multilingual language model (MBLM) built on Multilingual pre-trained language models (MPLMs)
Method based on transferring knowledge from high-performance monolingual models with a teacher-student framework.
Results on two cross-lingual classification tasks show that, with only the task's supervised data used, our method improves both the supervised and zero-shot performance of MPLMs.
arXiv Detail & Related papers (2022-02-28T09:51:32Z) - Multilingual Neural Machine Translation:Can Linguistic Hierarchies Help? [29.01386302441015]
Multilingual Neural Machine Translation (MNMT) trains a single NMT model that supports translation between multiple languages.
The performance of an MNMT model is highly dependent on the type of languages used in training, as transferring knowledge from a diverse set of languages degrades the translation performance due to negative transfer.
We propose a Hierarchical Knowledge Distillation (HKD) approach for MNMT which capitalises on language groups generated according to typological features and phylogeny of languages to overcome the issue of negative transfer.
arXiv Detail & Related papers (2021-10-15T02:31:48Z) - Self-Learning for Zero Shot Neural Machine Translation [13.551731309506874]
This work proposes a novel zero-shot NMT modeling approach that learns without the now-standard assumption of a pivot language sharing parallel data.
Compared to unsupervised NMT, consistent improvements are observed even in a domain-mismatch setting.
arXiv Detail & Related papers (2021-03-10T09:15:19Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Collective Wisdom: Improving Low-resource Neural Machine Translation
using Adaptive Knowledge Distillation [42.38435539241788]
Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Machine Translation (NMT) models in bilingually low-resource scenarios.
We propose an adaptive knowledge distillation approach to dynamically adjust the contribution of the teacher models during the distillation process.
Experiments on transferring from a collection of six language pairs from IWSLT to five low-resource language-pairs from TED Talks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-10-12T04:26:46Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z) - Structure-Level Knowledge Distillation For Multilingual Sequence
Labeling [73.40368222437912]
We propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models to the unified multilingual model (student)
Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.
arXiv Detail & Related papers (2020-04-08T07:14:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.