How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual
Translation via Tiny Multi-Parallel Data
- URL: http://arxiv.org/abs/2401.12413v2
- Date: Mon, 26 Feb 2024 23:23:31 GMT
- Title: How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual
Translation via Tiny Multi-Parallel Data
- Authors: Di Wu, Shaomu Tan, Yan Meng, David Stap and Christof Monz
- Abstract summary: A common, albeit resource-consuming, solution is to add as many related translation directions as possible to the training corpus.
We show that for an English-centric model, surprisingly large zero-shot improvements can be achieved by simply fine-tuning with a very small amount of multi-parallel data.
- Score: 10.286714403840355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot translation aims to translate between language pairs not seen
during training in Multilingual Machine Translation (MMT) and is largely
considered an open problem. A common, albeit resource-consuming, solution is to
add as many related translation directions as possible to the training corpus.
In this paper, we show that for an English-centric model, surprisingly large
zero-shot improvements can be achieved by simply fine-tuning with a very small
amount of multi-parallel data. For example, on the EC30 dataset, we obtain up
to +21.7 ChrF non-English overall improvements (870 directions) by using only
100 multi-parallel samples while preserving English-centric translation
quality. When investigating the size effect of fine-tuning data and its
transfer capabilities, we found that already a small, randomly sampled set of
fine-tuning directions is sufficient to achieve comparable improvements. The
resulting non-English performance is close to the complete translation upper
bound. Even in a minimal setting -- fine-tuning with only one single sample --
the well-known off-target issue is almost completely resolved, explaining
parts--but not all -- of the observed improvements in translation quality.
Related papers
- How Multilingual Are Large Language Models Fine-Tuned for Translation? [13.612090779277281]
Fine-tuning large language models (LLM) on parallel text has been shown to outperform dedicated translation systems trained in a supervised fashion on much larger amounts of parallel data.
How does translation fine-tuning impact the MT capabilities of LLMs for zero-shot languages, zero-shot language pairs, and translation tasks that do not involve English?
We find that translation fine-tuning improves translation quality even for zero-shot languages on average, but that the impact is uneven depending on the language pairs involved.
arXiv Detail & Related papers (2024-05-30T22:08:20Z) - Narrowing the Gap between Zero- and Few-shot Machine Translation by
Matching Styles [53.92189950211852]
Large language models have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning.
In this paper, we investigate the factors contributing to this gap and find that this gap can largely be closed (for about 70%) by matching the writing styles of the target corpus.
arXiv Detail & Related papers (2023-11-04T03:18:45Z) - Adapting to Non-Centered Languages for Zero-shot Multilingual
Translation [12.487990897680422]
We propose a simple, lightweight yet effective language-specific modeling method by adapting to non-centered languages.
Experiments with Transformer on IWSLT17, Europarl, TED talks, and OPUS-100 datasets show that our method can easily fit non-centered data conditions.
arXiv Detail & Related papers (2022-09-09T06:34:12Z) - OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource
Language Pair for Low-Resource Sentence Retrieval [91.76575626229824]
We present OneAligner, an alignment model specially designed for sentence retrieval tasks.
When trained with all language pairs of a large-scale parallel multilingual corpus (OPUS-100), this model achieves the state-of-the-art result.
We conclude through empirical results and analyses that the performance of the sentence alignment task depends mostly on the monolingual and parallel data size.
arXiv Detail & Related papers (2022-05-17T19:52:42Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Subword Segmentation and a Single Bridge Language Affect Zero-Shot
Neural Machine Translation [36.4055239280145]
We investigate zero-shot performance of a multilingual EN$leftrightarrow$FR,CS,DE,FI system trained on WMT data.
We observe a bias towards copying the source in zero-shot translation, and investigate how the choice of subword segmentation affects this bias.
arXiv Detail & Related papers (2020-11-03T13:45:54Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.