Do Multilingual Language Models Think Better in English?
- URL: http://arxiv.org/abs/2308.01223v1
- Date: Wed, 2 Aug 2023 15:29:22 GMT
- Title: Do Multilingual Language Models Think Better in English?
- Authors: Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel
Artetxe
- Abstract summary: Translate-test is a popular technique to improve the performance of multilingual language models.
In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system.
- Score: 24.713751471567395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Translate-test is a popular technique to improve the performance of
multilingual language models. This approach works by translating the input into
English using an external machine translation system, and running inference
over the translated input. However, these improvements can be attributed to the
use of a separate translation system, which is typically trained on large
amounts of parallel data not seen by the language model. In this work, we
introduce a new approach called self-translate, which overcomes the need of an
external translation system by leveraging the few-shot translation capabilities
of multilingual language models. Experiments over 5 tasks show that
self-translate consistently outperforms direct inference, demonstrating that
language models are unable to leverage their full multilingual potential when
prompted in non-English languages. Our code is available at
https://github.com/juletx/self-translate.
Related papers
- Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space.
In this work, we test this hypothesis by zero-shot translating from unseen languages.
We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z) - Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language.
Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z) - Improving Language Model Integration for Neural Machine Translation [43.85486035238116]
We show that accounting for the implicit language model significantly boosts the performance of language model fusion.
We find that accounting for the implicit language model significantly boosts the performance of language model fusion.
arXiv Detail & Related papers (2023-06-08T10:00:19Z) - XNLI 2.0: Improving XNLI dataset and performance on Cross Lingual
Understanding (XLU) [0.0]
We focus on improving the original XNLI dataset by re-translating the MNLI dataset in all of the 14 different languages present in XNLI.
We also perform experiments by training models in all 15 languages and analyzing their performance on the task of natural language inference.
arXiv Detail & Related papers (2023-01-16T17:24:57Z) - Language-Family Adapters for Low-Resource Multilingual Neural Machine
Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks.
Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive.
We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z) - Breaking Down Multilingual Machine Translation [74.24795388967907]
We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs)
Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
arXiv Detail & Related papers (2021-10-15T14:57:12Z) - Should we Stop Training More Monolingual Models, and Simply Use Machine
Translation Instead? [2.62121275102348]
We show that machine translation is a mature technology, which raises a serious counter-argument for training native language models for low-resource languages.
As English language models are improving at an unprecedented pace, which in turn improves machine translation, it is from an empirical and environmental stand-point more effective to translate data from low-resource languages into English.
arXiv Detail & Related papers (2021-04-21T10:21:24Z) - Towards Continual Learning for Multilingual Machine Translation via
Vocabulary Substitution [16.939016405962526]
We propose a straightforward vocabulary adaptation scheme to extend the language capacity of multilingual machine translation models.
Our approach is suitable for large-scale datasets, applies to distant languages with unseen scripts and incurs only minor degradation on the translation performance for the original language pairs.
arXiv Detail & Related papers (2021-03-11T17:10:21Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.