Related papers: Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders

Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders

URL: http://arxiv.org/abs/2505.05111v1
Date: Thu, 08 May 2025 10:24:44 GMT
Title: Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
Authors: Boyi Deng, Yu Wan, Yidan Zhang, Baosong Yang, Fuli Feng,
Abstract summary: We introduce a novel metric to assess the monolinguality of features obtained from SAEs.<n>We show that ablating these SAE features only significantly reduces abilities in one language of LLMs, leaving others almost unaffected.<n>We leverage these SAE-derived language-specific features to enhance steering vectors, achieving control over the language generated by LLMs.
Score: 41.1110443501488
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The mechanisms behind multilingual capabilities in Large Language Models (LLMs) have been examined using neuron-based or internal-activation-based methods. However, these methods often face challenges such as superposition and layer-wise activation variance, which limit their reliability. Sparse Autoencoders (SAEs) offer a more nuanced analysis by decomposing the activations of LLMs into sparse linear combination of SAE features. We introduce a novel metric to assess the monolinguality of features obtained from SAEs, discovering that some features are strongly related to specific languages. Additionally, we show that ablating these SAE features only significantly reduces abilities in one language of LLMs, leaving others almost unaffected. Interestingly, we find some languages have multiple synergistic SAE features, and ablating them together yields greater improvement than ablating individually. Moreover, we leverage these SAE-derived language-specific features to enhance steering vectors, achieving control over the language generated by LLMs.

Related papers

Causal Language Control in Multilingual Transformers via Sparse Feature Steering [3.790013563494571]
We investigate whether sparse autoencoder features can be leveraged to steer the generated language of multilingual language models.<n>We achieve controlled language shifts with up to 90% success, as measured by FastText language classification.<n>Our analysis reveals that language steering is most effective in mid-to-late transformer layers.
arXiv Detail & Related papers (2025-07-17T06:49:16Z)
Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages [11.19692440351977]
Existing studies often focus on individual neurons, but their polysemantic nature makes it difficult to isolate language-specific units.<n>We introduce SAE-LAPE, a method based on feature activation probability, to identify language-specific features within the feed-forward network.<n>These features influence the model's multilingual performance and language output and can be used for language identification with performance comparable to fastText.
arXiv Detail & Related papers (2025-07-15T12:00:30Z)
The Emergence of Abstract Thought in Large Language Models Beyond Any Language [95.50197866832772]
Large language models (LLMs) function effectively across a diverse range of languages.<n>Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts.<n>Recent results show strong multilingual performance, even surpassing English performance on specific tasks in other languages.
arXiv Detail & Related papers (2025-06-11T16:00:54Z)
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs) It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models [11.421452042888523]
We compare different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques. Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks.
arXiv Detail & Related papers (2024-08-26T16:29:13Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities. In this work, we investigate the spontaneous multilingual alignment improvement of LLMs. We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z)
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z)
Unraveling Babel: Exploring Multilingual Activation Patterns of LLMs and Their Applications [24.18102112644796]
We study the internal neuron activation patterns of large language models (LLMs) when processing different languages. We leverage the discovered differences in expert activation frequencies to guide sparse activation and pruning. Our findings offer new perspectives for applications such as sparse activation and model pruning.
arXiv Detail & Related papers (2024-02-26T07:44:56Z)
How Vocabulary Sharing Facilitates Multilingualism in LLaMA? [19.136382859468693]
Large Language Models (LLMs) often show strong performance on English tasks, while exhibiting limitations on other languages. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective.
arXiv Detail & Related papers (2023-11-15T16:13:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.