Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models
- URL: http://arxiv.org/abs/2601.03115v1
- Date: Tue, 06 Jan 2026 15:46:35 GMT
- Title: Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models
- Authors: Xiutian Zhao, Björn Schuller, Berrak Sisman,
- Abstract summary: We present the first neuron-level interpretability study of emotion-sensitive neurons (ESNs) in large audio-language models (LALMs)<n>We compare frequency-, entropy-, magnitude-, and contrast-based neuron selectors on multiple emotion recognition benchmarks.<n>Using inference-time interventions, we reveal a consistent emotion-specific signature.
- Score: 8.550786156000461
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotion is a central dimension of spoken communication, yet, we still lack a mechanistic account of how modern large audio-language models (LALMs) encode it internally. We present the first neuron-level interpretability study of emotion-sensitive neurons (ESNs) in LALMs and provide causal evidence that such units exist in Qwen2.5-Omni, Kimi-Audio, and Audio Flamingo 3. Across these three widely used open-source models, we compare frequency-, entropy-, magnitude-, and contrast-based neuron selectors on multiple emotion recognition benchmarks. Using inference-time interventions, we reveal a consistent emotion-specific signature: ablating neurons selected for a given emotion disproportionately degrades recognition of that emotion while largely preserving other classes, whereas gain-based amplification steers predictions toward the target emotion. These effects arise with modest identification data and scale systematically with intervention strength. We further observe that ESNs exhibit non-uniform layer-wise clustering with partial cross-dataset transfer. Taken together, our results offer a causal, neuron-level account of emotion decisions in LALMs and highlight targeted neuron interventions as an actionable handle for controllable affective behaviors.
Related papers
- Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition [56.00118641432005]
We propose a Memory-guided Prototypical Co-occurrence Learning framework that explicitly models emotion co-occurrence patterns.<n>Inspired by human cognitive memory systems, we introduce a memory retrieval strategy to extract semantic-level co-occurrence associations.<n>Our model learns affectively informative representations for accurate emotion distribution prediction.
arXiv Detail & Related papers (2026-02-24T04:11:25Z) - Towards Interpretable Hallucination Analysis and Mitigation in LVLMs via Contrastive Neuron Steering [60.23509717784518]
Existing mitigation methods predominantly focus on output-level adjustments, leaving internal mechanisms that give rise to hallucinations largely unexplored.<n>We propose Contrastive Neuron Steering ( CNS), which identifies image-specific neurons via contrastive analysis between clean and noisy inputs.<n> CNS selectively amplifies informative neurons while suppressing perturbation-induced activations, producing more robust and semantically grounded visual representations.
arXiv Detail & Related papers (2026-01-31T09:21:04Z) - E^2-LLM: Bridging Neural Signals and Interpretable Affective Analysis [54.763420895859035]
We present ELLM2-EEG-to-Emotion Large Language Model, first MLLM framework for interpretable emotion analysis from EEG.<n>ELLM integrates a pretrained EEG encoder with Q-based LLMs through learnable projection layers, employing a multi-stage training pipeline.<n>Experiments on the dataset across seven emotion categories demonstrate that ELLM2-EEG-to-Emotion Large Language Model achieves excellent performance on emotion classification.
arXiv Detail & Related papers (2026-01-11T13:21:20Z) - Decoding Predictive Inference in Visual Language Processing via Spatiotemporal Neural Coherence [2.208251557767776]
We present a machine learning framework for decoding neural responses to visual language stimuli in Deaf signers.<n>Our results reveal distributed left-hemispheric and low-frequency coherence as key features in language comprehension.<n>This work demonstrates a novel approach for probing experience-driven generative models of perception in the brain.
arXiv Detail & Related papers (2025-12-24T04:19:20Z) - Do LLMs "Feel"? Emotion Circuits Discovery and Control [54.57583855608979]
We study the internal mechanisms that give rise to emotional expression and in controlling emotions in generated text.<n>This is the first systematic study to uncover and validate emotion circuits in large language models.
arXiv Detail & Related papers (2025-10-13T12:24:24Z) - Bridging the behavior-neural gap: A multimodal AI reveals the brain's geometry of emotion more accurately than human self-reports [18.336392633341493]
We show that large-scale similarity judgments can more faithfully capture the brain's affective geometry.<n>Our findings provide compelling evidence that MLLMs can autonomously develop rich, neurally-aligned affective representations.
arXiv Detail & Related papers (2025-09-29T05:22:33Z) - Decoding Neural Emotion Patterns through Large Language Model Embeddings [3.8032942955371785]
We propose a computational framework that maps textual emotional content to anatomically defined brain regions without requiring neuroimaging.<n>Using OpenAI's text-embedding-ada-ada, we generate high-dimensional semantic representations, apply dimensionality reduction and clustering to identify emotional groups, and map them to 18 brain regions linked to emotional processing.<n>This cost-effective, scalable approach enables large-scale analysis of naturalistic language, distinguishes between clinical populations, and offers a brain-based benchmark for evaluating AI emotional expression.
arXiv Detail & Related papers (2025-08-12T20:51:56Z) - CAST-Phys: Contactless Affective States Through Physiological signals Database [74.28082880875368]
The lack of affective multi-modal datasets remains a major bottleneck in developing accurate emotion recognition systems.<n>We present the Contactless Affective States Through Physiological Signals Database (CAST-Phys), a novel high-quality dataset capable of remote physiological emotion recognition.<n>Our analysis highlights the crucial role of physiological signals in realistic scenarios where facial expressions alone may not provide sufficient emotional information.
arXiv Detail & Related papers (2025-07-08T15:20:24Z) - Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics.<n>We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention.<n>Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks.<n>Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - Emotion Analysis on EEG Signal Using Machine Learning and Neural Network [0.0]
The main purpose of this study is to improve ways to improve emotion recognition performance using brain signals.
Various approaches to human-machine interaction technologies have been ongoing for a long time, and in recent years, researchers have had great success in automatically understanding emotion using brain signals.
arXiv Detail & Related papers (2023-07-09T09:50:34Z) - EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation [34.24557248359872]
We propose an emotional inertia and contagion-driven dependency modeling approach (EmotionIC) for ERC task.
Our EmotionIC consists of three main components, i.e., Identity Masked Multi-Head Attention (IMMHA), Dialogue-based Gated Recurrent Unit (DiaGRU) and Skip-chain Conditional Random Field (SkipCRF)
Experimental results show that our method can significantly outperform the state-of-the-art models on four benchmark datasets.
arXiv Detail & Related papers (2023-03-20T13:58:35Z) - A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.