Towards a Common Understanding of Contributing Factors for Cross-Lingual
Transfer in Multilingual Language Models: A Review
- URL: http://arxiv.org/abs/2305.16768v1
- Date: Fri, 26 May 2023 09:31:12 GMT
- Title: Towards a Common Understanding of Contributing Factors for Cross-Lingual
Transfer in Multilingual Language Models: A Review
- Authors: Fred Philippy, Siwen Guo, Shohreh Haddadan
- Abstract summary: Pre-trained Multilingual Language Models (MLLMs) have shown a strong ability to transfer knowledge across different languages.
It is challenging to obtain a unique and straightforward explanation for its emergence.
This review provides, first, an aligned reference point for future research and, second, guidance for a better-informed and more efficient way of leveraging the cross-lingual capacity of MLLMs.
- Score: 2.578242050187029
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, pre-trained Multilingual Language Models (MLLMs) have shown
a strong ability to transfer knowledge across different languages. However,
given that the aspiration for such an ability has not been explicitly
incorporated in the design of the majority of MLLMs, it is challenging to
obtain a unique and straightforward explanation for its emergence. In this
review paper, we survey literature that investigates different factors
contributing to the capacity of MLLMs to perform zero-shot cross-lingual
transfer and subsequently outline and discuss these factors in detail. To
enhance the structure of this review and to facilitate consolidation with
future studies, we identify five categories of such factors. In addition to
providing a summary of empirical evidence from past studies, we identify
consensuses among studies with consistent findings and resolve conflicts among
contradictory ones. Our work contextualizes and unifies existing research
streams which aim at explaining the cross-lingual potential of MLLMs. This
review provides, first, an aligned reference point for future research and,
second, guidance for a better-informed and more efficient way of leveraging the
cross-lingual capacity of MLLMs.
Related papers
- 1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators? [46.43162333819418]
Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages.
Despite their capabilities, they exhibit inconsistencies in handling identical queries in different languages, presenting challenges for further advancement.
This paper introduces a method to enhance the multilingual performance of LLMs by aggregating knowledge from diverse languages.
arXiv Detail & Related papers (2024-06-20T20:32:53Z) - A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [48.314619377988436]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing.
Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient.
This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z) - Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers [81.47046536073682]
We present a review and provide a unified perspective to summarize the recent progress as well as emerging trends in multilingual large language models (MLLMs) literature.
We hope our work can provide the community with quick access and spur breakthrough research in MLLMs.
arXiv Detail & Related papers (2024-04-07T11:52:44Z) - A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias [5.104497013562654]
We present an overview of MLLMs, covering their evolution, key techniques, and multilingual capacities.
We explore widely utilized multilingual corpora for MLLMs' training and multilingual datasets oriented for downstream tasks.
We discuss bias on MLLMs including its category and evaluation metrics, and summarize the existing debiasing techniques.
arXiv Detail & Related papers (2024-04-01T05:13:56Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - LLaMA Beyond English: An Empirical Study on Language Capability Transfer [49.298360366468934]
We focus on how to effectively transfer the capabilities of language generation and following instructions to a non-English language.
We analyze the impact of key factors such as vocabulary extension, further pretraining, and instruction tuning on transfer.
We employ four widely used standardized testing benchmarks: C-Eval, MMLU, AGI-Eval, and GAOKAO-Bench.
arXiv Detail & Related papers (2024-01-02T06:29:02Z) - A Survey on Multimodal Large Language Models [71.63375558033364]
Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot.
This paper aims to trace and summarize the recent progress of MLLMs.
arXiv Detail & Related papers (2023-06-23T15:21:52Z) - CMMLU: Measuring massive multitask language understanding in Chinese [133.70911295934746]
This paper introduces a comprehensive Chinese benchmark that covers various subjects, including natural science, social sciences, engineering, and humanities.
CMMLU fills the gap in evaluating the knowledge and reasoning capabilities of large language models within the Chinese context.
arXiv Detail & Related papers (2023-06-15T15:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.