Related papers: Revisiting Modality Invariance in a Multilingual Speech-Text Model via Neuron-Level Analysis

Revisiting Modality Invariance in a Multilingual Speech-Text Model via Neuron-Level Analysis

URL: http://arxiv.org/abs/2601.17387v1
Date: Sat, 24 Jan 2026 09:22:18 GMT
Title: Revisiting Modality Invariance in a Multilingual Speech-Text Model via Neuron-Level Analysis
Authors: Toshiki Nakai, Varsha Suresh, Vera Demberg,
Abstract summary: We investigate where language and modality information is encoded, how selective neurons causally influence decoding, and how concentrated this influence is across the network.<n>We identify language- and modality-selective neurons using average-precision ranking, investigate their functional role via median-replacement interventions at inference time, and analyze activation-magnitude inequality across languages and modalities.
Score: 15.638379666159127
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multilingual speech-text foundation models aim to process language uniformly across both modality and language, yet it remains unclear whether they internally represent the same language consistently when it is spoken versus written. We investigate this question in SeamlessM4T v2 through three complementary analyses that probe where language and modality information is encoded, how selective neurons causally influence decoding, and how concentrated this influence is across the network. We identify language- and modality-selective neurons using average-precision ranking, investigate their functional role via median-replacement interventions at inference time, and analyze activation-magnitude inequality across languages and modalities. Across experiments, we find evidence of incomplete modality invariance. Although encoder representations become increasingly language-agnostic, this compression makes it more difficult for the shared decoder to recover the language of origin when constructing modality-agnostic representations, particularly when adapting from speech to text. We further observe sharply localized modality-selective structure in cross-attention key and value projections. Finally, speech-conditioned decoding and non-dominant scripts exhibit higher activation concentration, indicating heavier reliance on a small subset of neurons, which may underlie increased brittleness across modalities and languages.

Related papers

When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training [57.230355403478995]
We investigate the development of language-agnostic concept spaces during pretraining of EuroLLM.<n>We find that shared concept spaces emerge early and continue to refine, but that alignment with them is language-dependent.<n>In contrast to prior work, our fine-grained manual analysis reveals that some apparent gains in translation quality reflect shifts in behavior.
arXiv Detail & Related papers (2026-01-30T11:23:01Z)
Graph Modelling Analysis of Speech-Gesture Interaction for Aphasia Severity Estimation [0.0]
Aphasia is an acquired language disorder caused by injury to the regions of the brain that are responsible for language.<n>Recent advancements in speech analysis focus on automated estimation of aphasia severity from spontaneous speech.<n>In this work, we propose a graph neural network-based framework for estimating aphasia severity.
arXiv Detail & Related papers (2026-01-27T14:11:36Z)
Coherence in the brain unfolds across separable temporal regimes [1.3874648807526748]
Coherence in language requires the brain to satisfy two competing temporal demands.<n>We show that coherence is implemented through dissociable neural regimes of slow contextual integration and rapid event-driven reconfiguration.
arXiv Detail & Related papers (2025-12-23T16:16:42Z)
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models [56.61984030508691]
We present the first mechanistic interpretability study of language confusion.<n>We show that confusion points (CPs) are central to this phenomenon.<n>We show that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion.
arXiv Detail & Related papers (2025-05-22T11:29:17Z)
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training [58.696660064190475]
We find that the existence of code-switching, alternating between different languages within a context, is key to multilingual capabilities.<n>To better explore the power of code-switching for language alignment during pre-training, we investigate the strategy of synthetic code-switching.
arXiv Detail & Related papers (2025-04-02T15:09:58Z)
Decoding Continuous Character-based Language from Non-invasive Brain Recordings [33.11373366800627]
We propose a novel approach to decoding continuous language from single-trial non-invasive fMRI recordings. A character-based decoder is designed for the semantic reconstruction of continuous language characterized by inherent character structures. The ability to decode continuous language from single trials across subjects demonstrates the promising applications of non-invasive language brain-computer interfaces.
arXiv Detail & Related papers (2024-03-17T12:12:33Z)
Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks [0.0]
We train a recurrent neural network on a language identification task over a large database of speech recordings in 21 languages. The network was able to identify the language of 10-second recordings in 40% of the cases, and the language was in the top-3 guesses in two-thirds of the cases.
arXiv Detail & Related papers (2024-01-22T09:49:44Z)
BrainLLM: Generative Language Decoding from Brain Recordings [77.66707255697706]
We propose a generative language BCI that utilizes the capacity of a large language model and a semantic brain decoder.<n>The proposed model can generate coherent language sequences aligned with the semantic content of visual or auditory language stimuli.<n>Our findings demonstrate the potential and feasibility of employing BCIs in direct language generation.
arXiv Detail & Related papers (2023-11-16T13:37:21Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.