Related papers: Causal Language Control in Multilingual Transformers via Sparse Feature Steering

Causal Language Control in Multilingual Transformers via Sparse Feature Steering

URL: http://arxiv.org/abs/2507.13410v1
Date: Thu, 17 Jul 2025 06:49:16 GMT
Title: Causal Language Control in Multilingual Transformers via Sparse Feature Steering
Authors: Cheng-Ting Chou, George Liu, Jessica Sun, Cole Blondin, Kevin Zhu, Vasu Sharma, Sean O'Brien,
Abstract summary: We investigate whether sparse autoencoder features can be leveraged to steer the generated language of multilingual language models.<n>We achieve controlled language shifts with up to 90% success, as measured by FastText language classification.<n>Our analysis reveals that language steering is most effective in mid-to-late transformer layers.
Score: 3.790013563494571
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deterministically controlling the target generation language of large multilingual language models (LLMs) remains a fundamental challenge, particularly in zero-shot settings where neither explicit language prompts nor fine-tuning are available. In this work, we investigate whether sparse autoencoder (SAE) features, previously shown to correlate with interpretable model behaviors, can be leveraged to steer the generated language of LLMs during inference. Leveraging pretrained SAEs on the residual streams of Gemma-2B and Gemma-9B, we identify features whose activations differ most significantly between English and four target languages: Chinese, Japanese, Spanish, and French. By modifying just a single SAE feature at one transformer layer, we achieve controlled language shifts with up to 90\% success, as measured by FastText language classification, while preserving semantic fidelity according to LaBSE (Language-Agnostic BERT Sentence Embedding) similarity. Our analysis reveals that language steering is most effective in mid-to-late transformer layers and is amplified by specific attention heads disproportionately associated with language-sensitive SAE features. These results demonstrate the promise of sparse feature steering as a lightweight and interpretable mechanism for controllable multilingual generation.

Related papers

Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders [51.380449540006985]
Large Language Models (LLMs) can process many languages, yet how they internally represent this diversity remains unclear.<n>Do they form shared multilingual representations with language-specific decoding, and if so, why does performance still favor the dominant training language?<n>We analyze their internal mechanisms using cross-layer transcoders (CLT) and attribution graphs.
arXiv Detail & Related papers (2025-11-13T22:51:06Z)
Language steering in latent space to mitigate unintended code-switching [1.1330938617817454]
Large Language Models (LLMs) often exhibit unintended code-switching, reducing reliability in downstream tasks.<n>We propose latent-space language steering, a lightweight inference-time method that identifies language directions via PCA on parallel translations.<n>Our approach mitigates code-switching while preserving semantics with negligible computational overhead.
arXiv Detail & Related papers (2025-10-11T19:49:38Z)
Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages [11.19692440351977]
We introduce SAE-LAPE, a method based on feature activation probability, to identify language-specific features within the feed-forward network.<n>We find that many such features predominantly appear in the middle to final layers of the model and are interpretable.<n>These features influence the model's multilingual performance and language output and can be used for language identification with performance comparable to fastText.
arXiv Detail & Related papers (2025-07-15T12:00:30Z)
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders [41.1110443501488]
We introduce a novel metric to assess the monolinguality of features obtained from SAEs.<n>We show that ablating these SAE features only significantly reduces abilities in one language of LLMs, leaving others almost unaffected.<n>We leverage these SAE-derived language-specific features to enhance steering vectors, achieving control over the language generated by LLMs.
arXiv Detail & Related papers (2025-05-08T10:24:44Z)
Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling [50.62091603179394]
Whisper, one of the most advanced ASR models, handles 99 languages effectively.<n>However, Whisper struggles with unseen languages, those not included in its pre-training.<n>We propose methods that exploit these relationships to enhance ASR performance on unseen languages.
arXiv Detail & Related papers (2024-12-21T04:05:43Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.<n>We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.<n>We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model [14.39119862985503]
We aim to create a multilingual ALT system with available datasets. Inspired by architectures that have been proven effective for English ALT, we adapt these techniques to the multilingual scenario. We evaluate the performance of the multilingual model in comparison to its monolingual counterparts.
arXiv Detail & Related papers (2024-06-25T15:02:32Z)
Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z)
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers [71.76680102779765]
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure. We propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
arXiv Detail & Related papers (2022-11-05T04:03:55Z)
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z)
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning. During inference, the model makes predictions based on the text input in the target language and its translation in the source language. To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
Inducing Language-Agnostic Multilingual Representations [61.97381112847459]
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world. We examine three approaches for this: (i) re-aligning the vector spaces of target languages to a pivot source language; (ii) removing language-specific means and variances, which yields better discriminativeness of embeddings as a by-product; and (iii) increasing input similarity across languages by removing morphological contractions and sentence reordering.
arXiv Detail & Related papers (2020-08-20T17:58:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.