Related papers: How Vocabulary Sharing Facilitates Multilingualism in LLaMA?

How Vocabulary Sharing Facilitates Multilingualism in LLaMA?

URL: http://arxiv.org/abs/2311.09071v2
Date: Mon, 3 Jun 2024 06:11:06 GMT
Title: How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
Authors: Fei Yuan, Shuai Yuan, Zhiyong Wu, Lei Li,
Abstract summary: Large Language Models (LLMs) often show strong performance on English tasks, while exhibiting limitations on other languages. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective.
Score: 19.136382859468693
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages. What is an LLM's multilingual capability when it is trained only on certain languages? The underlying mechanism remains unclear. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective by conducting an exhaustive analysis across 101 languages. Through the investigation of the performance gap before and after embedding fine-tuning, we discovered four distinct quadrants. By delving into each quadrant we provide actionable and efficient guidelines for tuning these languages. Extensive experiments reveal that existing LLMs possess multilingual capabilities that surpass our expectations, and we can significantly improve the multilingual performance of LLMs based on these attributes of each quadrant~\footnote{\url{https://github.com/CONE-MT/Vocabulary-Sharing-Facilitates-Multilingualism}.}.

Related papers

Do Multilingual LLMs have specialized language heads? [0.571097144710995]
This paper explores whether multilingual LLMs have specialized language attention heads for each language.<n>It investigates the possibility of removing language-specific heads for unwanted languages without degrading performance in the targeted languages.
arXiv Detail & Related papers (2026-02-09T13:15:17Z)
AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment [46.881574083116085]
Multilingual large language models (LLMs) possess impressive multilingual understanding and generation capabilities.<n>LLMs' performance and cross-lingual alignment often lag for non-dominant languages.<n>We propose AlignX to bridge the multilingual performance gap, which is a two-stage representation-level framework.
arXiv Detail & Related papers (2025-09-29T06:37:46Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs) It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
Teaching LLMs to Abstain across Languages via Multilingual Feedback [40.84205285309612]
We show that multilingual feedback helps identify knowledge gaps across diverse languages, cultures, and communities. Extensive experiments demonstrate that our multilingual feedback approach outperforms various strong baselines. Further analysis reveals that multilingual feedback is both an effective and a more equitable abstain strategy to serve diverse language speakers.
arXiv Detail & Related papers (2024-06-22T21:59:12Z)
Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities. In this work, we investigate the spontaneous multilingual alignment improvement of LLMs. We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z)
Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training [29.47243668154796]
BLOOMZMMS is a novel model that integrates a multilingual LLM with a multilingual speech encoder. We demonstrate the transferability of linguistic knowledge from the text to the speech modality. Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks.
arXiv Detail & Related papers (2024-04-16T21:45:59Z)
Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models [79.46179534911019]
Large language models (LLMs) have demonstrated multilingual capabilities; yet, they are mostly English-centric due to imbalanced training corpora. This work extends the evaluation from NLP tasks to real user queries. For culture-related tasks that need deep language understanding, prompting in the native language tends to be more promising.
arXiv Detail & Related papers (2024-03-15T12:47:39Z)
How do Large Language Models Handle Multilingualism? [81.15060972112563]
This study explores how large language models (LLMs) handle multilingualism. LLMs initially understand the query, converting multilingual inputs into English for task-solving. In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures.
arXiv Detail & Related papers (2024-02-29T02:55:26Z)
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z)
Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions [68.01449013641532]
Large-scale Pretrained Language Models (LLMs) have shown strong abilities in multilingual translations. We present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7B, to perform multilingual translation.
arXiv Detail & Related papers (2023-05-24T12:00:24Z)
Don't Trust ChatGPT when Your Question is not in English: A Study of Multilingual Abilities and Types of LLMs [16.770697902481107]
Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities. We propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings. The results show that GPT exhibits highly translating-like behaviour in multilingual settings.
arXiv Detail & Related papers (2023-05-24T02:05:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.