Related papers: Do Multilingual LLMs have specialized language heads?

Do Multilingual LLMs have specialized language heads?

URL: http://arxiv.org/abs/2602.08625v1
Date: Mon, 09 Feb 2026 13:15:17 GMT
Title: Do Multilingual LLMs have specialized language heads?
Authors: Muhammad Naufil,
Abstract summary: This paper explores whether multilingual LLMs have specialized language attention heads for each language.<n>It investigates the possibility of removing language-specific heads for unwanted languages without degrading performance in the targeted languages.
Score: 0.571097144710995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multilingual large language models (LLMs) have gained significant popularity for their ability to process and generate text across multiple languages. However, deploying these models in production can be inefficient when only a subset of the supported languages is of interest. There has been some research conducted on identifying whether machine translation models have language-specific or language-agnostic heads, however no research has been conducted for multilingual LLMs, to the best of our knowledge, that as we know are capable of performing diverse tasks beyond just translation. This paper explores whether multilingual LLMs have specialized language attention heads for each language, and investigates the possibility of removing language-specific heads for unwanted languages without degrading performance in the targeted languages. Our findings could inform more efficient deployment strategies for multilingual LLMs, enabling reduced model complexity while maintaining high accuracy for targeted languages.

Related papers

Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders [51.380449540006985]
Large Language Models (LLMs) can process many languages, yet how they internally represent this diversity remains unclear.<n>Do they form shared multilingual representations with language-specific decoding, and if so, why does performance still favor the dominant training language?<n>We analyze their internal mechanisms using cross-layer transcoders (CLT) and attribution graphs.
arXiv Detail & Related papers (2025-11-13T22:51:06Z)
Linguistically-Informed Multilingual Instruction Tuning: Is There an Optimal Set of Languages to Tune? [0.0]
This study proposes a method to select languages for instruction tuning in a linguistically informed way. We use a simple algorithm to choose diverse languages and test their effectiveness on various benchmarks and open-ended questions. Our results show that this careful selection generally leads to better outcomes than choosing languages at random.
arXiv Detail & Related papers (2024-10-10T10:57:24Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
We propose Lens, a novel approach to enhance multilingual capabilities in large language models (LLMs)<n>Lens operates on two subspaces: the language-agnostic subspace, where it aligns target languages with the central language to inherit strong semantic representations, and the language-specific subspace, where it separates target and central languages to preserve linguistic specificity.<n>Lens significantly improves multilingual performance while maintaining the model's English proficiency, achieving better results with less computational cost compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.<n>We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.<n>We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities. In this work, we investigate the spontaneous multilingual alignment improvement of LLMs. We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z)
Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation [25.850573463743352]
Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks. Yet significant performance disparities exist across different languages within the same mPLM. We introduce ALSACE to leverage the learned knowledge from the well-performing languages to guide under-performing ones within the same mPLM.
arXiv Detail & Related papers (2024-04-12T14:19:16Z)
Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages [60.162717568496355]
Large language models (LLMs) have been pre-trained on multilingual corpora. Their performance still lags behind in most languages compared to a few resource-rich languages.
arXiv Detail & Related papers (2024-02-19T15:07:32Z)
How Vocabulary Sharing Facilitates Multilingualism in LLaMA? [19.136382859468693]
Large Language Models (LLMs) often show strong performance on English tasks, while exhibiting limitations on other languages. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective.
arXiv Detail & Related papers (2023-11-15T16:13:14Z)
Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.