Related papers: Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models

Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models

URL: http://arxiv.org/abs/2510.13580v1
Date: Wed, 15 Oct 2025 14:14:49 GMT
Title: Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models
Authors: Daniil Gurgurov, Josef van Genabith, Simon Ostermann,
Abstract summary: Large language models exhibit uneven performance across languages.<n>We present a framework for enhancing monolingual capabilities of LLMs in underrepresented languages.<n>Our approach identifies language-specific neurons using Language Activation Probability Entropy and fine-tunes only the weights associated with these neurons.
Score: 11.719190735841407
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models exhibit uneven performance across languages, with substantial gaps between high- and low-resource languages. We present a framework for enhancing monolingual capabilities of LLMs in underrepresented languages while preserving their general-purpose performance through targeted fine-tuning of language-specific subnetworks. Our approach identifies language-specific neurons using Language Activation Probability Entropy and fine-tunes only the weights associated with these neurons, a dedicated subnetwork, on target-language data. Experiments on Llama-3.1-8B and Mistral-Nemo-12B across 12 mid- and low-resource languages demonstrate that our method consistently outperforms full fine-tuning, FFN-only fine-tuning, LoRA adaptation, and random subset fine-tuning baselines while efficiently updating only up to 1% of model parameters. Beyond performance improvements, we observe enhanced favorable training dynamics, cross-lingual representational alignment, and systematic weight update changes. To facilitate future research, we release language-specific neuron identifications for over 100 languages as well as our adaptation pipeline, offering a cost-effective pathway for adapting state-of-the-art models to underrepresented languages.

Related papers

Grow Up and Merge: Scaling Strategies for Efficient Language Adaptation [4.2178072320683375]
We investigate scaling as an efficient strategy for adapting pretrained models to new target languages.<n>We find that, once exposed to sufficient target-language data, larger upscaled models can match or surpass the performance of smaller models continually pretrained.<n>Finally, we explore whether such scaled, language-specific models can be merged to construct modular and flexible multilingual systems.
arXiv Detail & Related papers (2025-12-11T16:09:54Z)
Adapting Language Models to Indonesian Local Languages: An Empirical Study of Language Transferability on Zero-Shot Settings [1.1556013985948772]
We evaluate transferability of pre-trained language models to low-resource Indonesian local languages.<n>We group the target languages into three categories: seen, partially seen, and unseen.<n> Multilingual models perform best on seen languages, moderately on partially seen ones, and poorly on unseen languages.<n>We find that MAD-X significantly improves performance, especially for seen and partially seen languages, without requiring labeled data in the target language.
arXiv Detail & Related papers (2025-07-02T12:17:55Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
We propose Lens, a novel approach to enhance multilingual capabilities in large language models (LLMs)<n>Lens operates on two subspaces: the language-agnostic subspace, where it aligns target languages with the central language to inherit strong semantic representations, and the language-specific subspace, where it separates target and central languages to preserve linguistic specificity.<n>Lens significantly improves multilingual performance while maintaining the model's English proficiency, achieving better results with less computational cost compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Targeted Multilingual Adaptation for Low-resource Language Families [17.212424929235624]
We study best practices for adapting a pre-trained model to a language family. Our adapted models significantly outperform mono- and multilingual baselines. Low-resource languages can be aggressively up-sampled during training at little detriment to performance in high-resource languages.
arXiv Detail & Related papers (2024-05-20T23:38:06Z)
On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models. We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z)
Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z)
Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment [31.885608173448368]
Pre-trained vision and language models such as CLIP have witnessed remarkable success in connecting images and texts with a primary focus on English texts. disparities in performance among different languages have been observed due to uneven resource availability. We propose a new parameter-efficient cross-lingual transfer learning framework that utilizes a translation-based alignment method to mitigate multilingual disparities.
arXiv Detail & Related papers (2023-05-02T14:09:02Z)
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages. We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)
Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements. We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics. We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.