On the Off-Target Problem of Zero-Shot Multilingual Neural Machine
Translation
- URL: http://arxiv.org/abs/2305.10930v3
- Date: Fri, 2 Jun 2023 02:58:03 GMT
- Title: On the Off-Target Problem of Zero-Shot Multilingual Neural Machine
Translation
- Authors: Liang Chen and Shuming Ma and Dongdong Zhang and Furu Wei and Baobao
Chang
- Abstract summary: We find that failing in encoding discriminative target language signal will lead to off-target and a closer lexical distance.
We propose Language Aware Vocabulary Sharing (LAVS) to construct the multilingual vocabulary.
We conduct experiments on a multilingual machine translation benchmark in 11 languages.
- Score: 104.85258654917297
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While multilingual neural machine translation has achieved great success, it
suffers from the off-target issue, where the translation is in the wrong
language. This problem is more pronounced on zero-shot translation tasks. In
this work, we find that failing in encoding discriminative target language
signal will lead to off-target and a closer lexical distance (i.e.,
KL-divergence) between two languages' vocabularies is related with a higher
off-target rate. We also find that solely isolating the vocab of different
languages in the decoder can alleviate the problem. Motivated by the findings,
we propose Language Aware Vocabulary Sharing (LAVS), a simple and effective
algorithm to construct the multilingual vocabulary, that greatly alleviates the
off-target problem of the translation model by increasing the KL-divergence
between languages. We conduct experiments on a multilingual machine translation
benchmark in 11 languages. Experiments show that the off-target rate for 90
translation tasks is reduced from 29\% to 8\%, while the overall BLEU score is
improved by an average of 1.9 points without extra training cost or sacrificing
the supervised directions' performance. We release the code at
https://github.com/PKUnlp-icler/Off-Target-MNMT for reproduction.
Related papers
- Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space.
In this work, we test this hypothesis by zero-shot translating from unseen languages.
We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z) - ZGUL: Zero-shot Generalization to Unseen Languages using Multi-source
Ensembling of Language Adapters [29.211715245603234]
We tackle the problem of zero-shot cross-lingual transfer in NLP tasks via the use of language adapters (LAs)
Training target LA requires unlabeled data, which may not be readily available for low resource unseen languages.
We extend ZGUL to settings where either (1) some unlabeled data or (2) few-shot training examples are available for the target language.
arXiv Detail & Related papers (2023-10-25T06:22:29Z) - Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot
Translation [79.96416609433724]
Zero-shot translation (ZST) aims to translate between unseen language pairs in training data.
The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs.
Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem.
arXiv Detail & Related papers (2023-09-28T17:02:36Z) - Mitigating Hallucinations and Off-target Machine Translation with
Source-Contrastive and Language-Contrastive Decoding [53.84948040596055]
We introduce two related methods to mitigate failure cases with a modified decoding objective.
Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations.
arXiv Detail & Related papers (2023-09-13T17:15:27Z) - On the Copying Problem of Unsupervised NMT: A Training Schedule with a
Language Discriminator Loss [120.19360680963152]
unsupervised neural machine translation (UNMT) has achieved success in many language pairs.
The copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs.
We propose a simple but effective training schedule that incorporates a language discriminator loss.
arXiv Detail & Related papers (2023-05-26T18:14:23Z) - Improving Zero-Shot Translation by Disentangling Positional Information [24.02434897109097]
We show that a main factor causing the language-specific representations is the positional correspondence to input tokens.
We gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions.
arXiv Detail & Related papers (2020-12-30T12:20:41Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.