Subword Segmentation and a Single Bridge Language Affect Zero-Shot
Neural Machine Translation
- URL: http://arxiv.org/abs/2011.01703v1
- Date: Tue, 3 Nov 2020 13:45:54 GMT
- Title: Subword Segmentation and a Single Bridge Language Affect Zero-Shot
Neural Machine Translation
- Authors: Annette Rios and Mathias M\"uller and Rico Sennrich
- Abstract summary: We investigate zero-shot performance of a multilingual EN$leftrightarrow$FR,CS,DE,FI system trained on WMT data.
We observe a bias towards copying the source in zero-shot translation, and investigate how the choice of subword segmentation affects this bias.
- Score: 36.4055239280145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Zero-shot neural machine translation is an attractive goal because of the
high cost of obtaining data and building translation systems for new
translation directions. However, previous papers have reported mixed success in
zero-shot translation. It is hard to predict in which settings it will be
effective, and what limits performance compared to a fully supervised system.
In this paper, we investigate zero-shot performance of a multilingual
EN$\leftrightarrow${FR,CS,DE,FI} system trained on WMT data. We find that
zero-shot performance is highly unstable and can vary by more than 6 BLEU
between training runs, making it difficult to reliably track improvements. We
observe a bias towards copying the source in zero-shot translation, and
investigate how the choice of subword segmentation affects this bias. We find
that language-specific subword segmentation results in less subword copying at
training time, and leads to better zero-shot performance compared to jointly
trained segmentation. A recent trend in multilingual models is to not train on
parallel data between all language pairs, but have a single bridge language,
e.g. English. We find that this negatively affects zero-shot translation and
leads to a failure mode where the model ignores the language tag and instead
produces English output in zero-shot directions. We show that this bias towards
English can be effectively reduced with even a small amount of parallel data in
some of the non-English pairs.
Related papers
- How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual
Translation via Tiny Multi-Parallel Data [10.286714403840355]
A common, albeit resource-consuming, solution is to add as many related translation directions as possible to the training corpus.
We show that for an English-centric model, surprisingly large zero-shot improvements can be achieved by simply fine-tuning with a very small amount of multi-parallel data.
arXiv Detail & Related papers (2024-01-22T23:55:00Z) - Question Translation Training for Better Multilingual Reasoning [108.10066378240879]
Large language models show compelling performance on reasoning tasks but they tend to perform much worse in languages other than English.
A typical solution is to translate instruction data into all languages of interest, and then train on the resulting multilingual data, which is called translate-training.
In this paper we explore the benefits of question alignment, where we train the model to translate reasoning questions into English by finetuning on X-English parallel question data.
arXiv Detail & Related papers (2024-01-15T16:39:10Z) - Narrowing the Gap between Zero- and Few-shot Machine Translation by
Matching Styles [53.92189950211852]
Large language models have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning.
In this paper, we investigate the factors contributing to this gap and find that this gap can largely be closed (for about 70%) by matching the writing styles of the target corpus.
arXiv Detail & Related papers (2023-11-04T03:18:45Z) - Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual
Translation of Dravidian Languages [0.34998703934432673]
We build a single-decoder neural machine translation system for Dravidian-Dravidian multilingual translation.
Our model achieves scores within 3 BLEU of large-scale pivot-based models when it is trained on 50% of the language directions.
arXiv Detail & Related papers (2023-08-10T13:38:09Z) - On the Off-Target Problem of Zero-Shot Multilingual Neural Machine
Translation [104.85258654917297]
We find that failing in encoding discriminative target language signal will lead to off-target and a closer lexical distance.
We propose Language Aware Vocabulary Sharing (LAVS) to construct the multilingual vocabulary.
We conduct experiments on a multilingual machine translation benchmark in 11 languages.
arXiv Detail & Related papers (2023-05-18T12:43:31Z) - Language Agnostic Multilingual Information Retrieval with Contrastive
Learning [59.26316111760971]
We present an effective method to train multilingual information retrieval systems.
We leverage parallel and non-parallel corpora to improve the pretrained multilingual language models.
Our model can work well even with a small number of parallel sentences.
arXiv Detail & Related papers (2022-10-12T23:53:50Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Rethinking Zero-shot Neural Machine Translation: From a Perspective of
Latent Variables [28.101782382170306]
We introduce a denoising autoencoder objective based on pivot language into traditional training objective to improve the translation accuracy on zero-shot directions.
We demonstrate that the proposed method is able to effectively eliminate the spurious correlations and significantly outperforms state-of-the-art methods with a remarkable performance.
arXiv Detail & Related papers (2021-09-10T07:18:53Z) - Improving Zero-Shot Translation by Disentangling Positional Information [24.02434897109097]
We show that a main factor causing the language-specific representations is the positional correspondence to input tokens.
We gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions.
arXiv Detail & Related papers (2020-12-30T12:20:41Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.