Related papers: Unraveling Babel: Exploring Multilingual Activation Patterns of LLMs and Their Applications

Unraveling Babel: Exploring Multilingual Activation Patterns of LLMs and Their Applications

URL: http://arxiv.org/abs/2402.16367v3
Date: Sun, 06 Oct 2024 04:37:02 GMT
Title: Unraveling Babel: Exploring Multilingual Activation Patterns of LLMs and Their Applications
Authors: Weize Liu, Yinlong Xu, Hongxia Xu, Jintai Chen, Xuming Hu, Jian Wu,
Abstract summary: We study the internal neuron activation patterns of large language models (LLMs) when processing different languages. We leverage the discovered differences in expert activation frequencies to guide sparse activation and pruning. Our findings offer new perspectives for applications such as sparse activation and model pruning.
Score: 24.18102112644796
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, large language models (LLMs) have achieved tremendous breakthroughs in the field of NLP, but still lack understanding of their internal neuron activities when processing different languages. We designed a method to convert dense LLMs into fine-grained MoE architectures, and then visually studied the multilingual activation patterns of LLMs through expert activation frequency heatmaps. Through comprehensive experiments on different model families, different model sizes, and different variants, we analyzed the similarities and differences in the internal neuron activation patterns of LLMs when processing different languages. Specifically, we investigated the distribution of high-frequency activated experts, multilingual shared experts, whether multilingual activation patterns are related to language families, and the impact of instruction tuning on activation patterns. We further explored leveraging the discovered differences in expert activation frequencies to guide sparse activation and pruning. Experimental results demonstrated that our method significantly outperformed random expert pruning and even exceeded the performance of unpruned models in some languages. Additionally, we found that configuring different pruning rates for different layers based on activation level differences could achieve better results. Our findings reveal the multilingual processing mechanisms within LLMs and utilize these insights to offer new perspectives for applications such as sparse activation and model pruning.

Related papers

Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering [61.0787902713059]
We propose a routing-guided steering method that adaptively guides routing behavior in middle layers toward shared experts associated with dominant languages at inference time.<n>Our code is available at http://conctsai.com/multilingualism-in-Mixture-of-Experts-LLMs.
arXiv Detail & Related papers (2026-01-20T15:04:25Z)
Multilingual Routing in Mixture-of-Experts [45.90403983668531]
We analyze expert routing patterns using parallel multilingual datasets.<n>We find that MoE models route tokens in language-specific ways in the early and late decoder layers.<n>We introduce a method that steers the router by promoting middle-layer task experts frequently activated in English.
arXiv Detail & Related papers (2025-10-06T11:09:20Z)
Language-Specific Layer Matters: Efficient Multilingual Enhancement for Large Vision-Language Models [60.39744129890118]
Large vision-language models (LVLMs) have demonstrated exceptional capabilities in understanding visual information with human languages.<n>In this work, we identify a salient correlation between the multilingual understanding ability of LVLMs and language-specific neuron activations in shallow layers.<n>We introduce PLAST, a training recipe that achieves efficient multilingual enhancement for LVLMs by Precise LAnguage-Specific layers fine-Tuning.
arXiv Detail & Related papers (2025-08-25T18:15:25Z)
The Emergence of Abstract Thought in Large Language Models Beyond Any Language [95.50197866832772]
Large language models (LLMs) function effectively across a diverse range of languages.<n>Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts.<n>Recent results show strong multilingual performance, even surpassing English performance on specific tasks in other languages.
arXiv Detail & Related papers (2025-06-11T16:00:54Z)
Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts [98.73585104789217]
We propose a layer-wise expert allocation algorithm (LayerMoE) to determine the appropriate number of new experts for each layer.<n>Our method outperforms the previous state-of-the-art baseline with 60% fewer experts in the single-expansion setting.
arXiv Detail & Related papers (2025-05-28T16:54:53Z)
The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model [59.357993924917]
We study the evolution of multilingual capabilities in large language models (LLMs) during the pre-training process. We propose the Babel Tower Hypothesis, which describes the entire process of LLMs acquiring new language capabilities. We propose a novel method to construct an optimized pre-training corpus for multilingual code LLMs.
arXiv Detail & Related papers (2024-12-10T08:28:57Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
Converging to a Lingua Franca: Evolution of Linguistic Regions and Semantics Alignment in Multilingual Large Language Models [11.423589362950812]
Large language models (LLMs) have demonstrated remarkable performance, particularly in multilingual contexts. Recent studies suggest that LLMs can transfer skills learned in one language to others, but the internal mechanisms behind this ability remain unclear. This paper provides insights into the internal workings of LLMs, offering a foundation for future improvements in their cross-lingual capabilities.
arXiv Detail & Related papers (2024-10-15T15:49:15Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts. We find that Llama Instruct and Mistral models exhibit high degrees of language confusion. We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs [85.0284555835015]
Large language models (LLMs) have revolutionized the field of natural language processing (NLP)<n>Few studies have attempted to explore the internal workings of LLMs in multilingual settings.<n>We classify neurons into four distinct categories based on their responses to a specific input across different languages.
arXiv Detail & Related papers (2024-06-13T16:04:11Z)
Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities. In this work, we investigate the spontaneous multilingual alignment improvement of LLMs. We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z)
Understanding the role of FFNs in driving multilingual behaviour in LLMs [0.0]
In this paper, we conduct an in-depth analysis of the multilingual capabilities of a family of Large Language Models. We introduce novel metrics to probe the model's multilingual behaviour at different layers and shed light on the impact of architectural choices on multilingual processing.
arXiv Detail & Related papers (2024-04-22T03:47:00Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z)
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback [61.83548032416181]
We present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research.
arXiv Detail & Related papers (2023-07-29T18:01:46Z)
Don't Trust ChatGPT when Your Question is not in English: A Study of Multilingual Abilities and Types of LLMs [16.770697902481107]
Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities. We propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings. The results show that GPT exhibits highly translating-like behaviour in multilingual settings.
arXiv Detail & Related papers (2023-05-24T02:05:03Z)
Multilingual Large Language Models Are Not (Yet) Code-Switchers [41.47534626749588]
Large Language Models (LLMs) have recently shown great capabilities in a wide range of tasks. The practice of alternating languages within an utterance remains relatively uncharted. We argue that current "multilingualism" in LLMs does not inherently imply proficiency with code-switching texts.
arXiv Detail & Related papers (2023-05-23T16:50:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.