Language Versatilists vs. Specialists: An Empirical Revisiting on
Multilingual Transfer Ability
- URL: http://arxiv.org/abs/2306.06688v1
- Date: Sun, 11 Jun 2023 14:03:09 GMT
- Title: Language Versatilists vs. Specialists: An Empirical Revisiting on
Multilingual Transfer Ability
- Authors: Jiacheng Ye, Xijia Tao, Lingpeng Kong
- Abstract summary: We conduct experiments across four types of reasoning tasks.
We find that the multilingual pretrained model does not always outperform an English-centric model.
English appears to be a less suitable source language, and the choice of source language becomes less important when the English-centric model scales up.
- Score: 11.000499414131324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual transfer ability, which reflects how well the models fine-tuned
on one source language can be applied to other languages, has been well studied
in multilingual pre-trained models (e.g., BLOOM). However, such ability has not
been investigated for English-centric models (e.g., LLaMA). To fill this gap,
we study the following research questions. First, does multilingual transfer
ability exist in English-centric models and how does it compare with
multilingual pretrained models? Second, does it only appears when English is
the source language for the English-centric model? Third, how does it vary in
different tasks? We take multilingual reasoning ability as our focus and
conduct extensive experiments across four types of reasoning tasks. We find
that the multilingual pretrained model does not always outperform an
English-centric model. Furthermore, English appears to be a less suitable
source language, and the choice of source language becomes less important when
the English-centric model scales up. In addition, different types of tasks
exhibit different multilingual transfer abilities. These findings demonstrate
that English-centric models not only possess multilingual transfer ability but
may even surpass the transferability of multilingual pretrained models if
well-trained. By showing the strength and weaknesses, the experiments also
provide valuable insights into enhancing multilingual reasoning abilities for
the English-centric models.
Related papers
- Could We Have Had Better Multilingual LLMs If English Was Not the Central Language? [4.655168524016426]
Large Language Models (LLMs) demonstrate strong machine translation capabilities on languages they are trained on.
Our study delves into Llama2's translation capabilities.
Our experiments show that the 7B Llama2 model yields above 10 BLEU when translating into all languages it has seen.
arXiv Detail & Related papers (2024-02-21T16:32:38Z) - Multilingual BERT has an accent: Evaluating English influences on
fluency in multilingual models [23.62852626011989]
We show that grammatical structures in higher-resource languages bleed into lower-resource languages.
We show this bias via a novel method for comparing the fluency of multilingual models to the fluency of monolingual Spanish and Greek models.
arXiv Detail & Related papers (2022-10-11T17:06:38Z) - MonoByte: A Pool of Monolingual Byte-level Language Models [4.491765479948667]
We release 10 monolingual byte-level models rigorously pretrained under the same configuration.
Because they are tokenizer-free, the problem of unseen token embeddings is eliminated.
Experiments on QA and NLI tasks show that our monolingual models achieve competitive performance to the multilingual one.
arXiv Detail & Related papers (2022-09-22T14:32:48Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
Pre-trained Models [84.86942006830772]
We conjecture that multilingual pre-trained models can derive language-universal abstractions about grammar.
We conduct the first large-scale empirical study over 43 languages and 14 morphosyntactic categories with a state-of-the-art neuron-level probe.
arXiv Detail & Related papers (2022-05-04T12:22:31Z) - Towards Best Practices for Training Multilingual Dense Retrieval Models [54.91016739123398]
We focus on the task of monolingual retrieval in a variety of typologically diverse languages using one such design.
Our study is organized as a "best practices" guide for training multilingual dense retrieval models.
arXiv Detail & Related papers (2022-04-05T17:12:53Z) - Do Multilingual Language Models Capture Differing Moral Norms? [71.52261949766101]
Massively multilingual sentence representations are trained on large corpora of uncurated data.
This may cause the models to grasp cultural values including moral judgments from the high-resource languages.
The lack of data in certain languages can also lead to developing random and thus potentially harmful beliefs.
arXiv Detail & Related papers (2022-03-18T12:26:37Z) - On the ability of monolingual models to learn language-agnostic
representations [2.604227467422371]
We show that monolingual models pretrained and finetuned on different languages achieve competitive performance.
For example, models pretrained on distant languages such as German and Portuguese perform similarly on English tasks.
arXiv Detail & Related papers (2021-09-04T22:09:44Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer [39.360667403003745]
Zero-shot cross-lingual transfer is emerging as a practical solution.
English is the dominant source language for transfer, as reinforced by popular zero-shot benchmarks.
We find that other high-resource languages such as German and Russian often transfer more effectively.
arXiv Detail & Related papers (2021-06-30T16:05:57Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.