Related papers: Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models

Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models

URL: http://arxiv.org/abs/2601.03115v1
Date: Tue, 06 Jan 2026 15:46:35 GMT
Title: Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models
Authors: Xiutian Zhao, Björn Schuller, Berrak Sisman,
Abstract summary: We present the first neuron-level interpretability study of emotion-sensitive neurons (ESNs) in large audio-language models (LALMs)<n>We compare frequency-, entropy-, magnitude-, and contrast-based neuron selectors on multiple emotion recognition benchmarks.<n>Using inference-time interventions, we reveal a consistent emotion-specific signature.
Score: 8.550786156000461
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Emotion is a central dimension of spoken communication, yet, we still lack a mechanistic account of how modern large audio-language models (LALMs) encode it internally. We present the first neuron-level interpretability study of emotion-sensitive neurons (ESNs) in LALMs and provide causal evidence that such units exist in Qwen2.5-Omni, Kimi-Audio, and Audio Flamingo 3. Across these three widely used open-source models, we compare frequency-, entropy-, magnitude-, and contrast-based neuron selectors on multiple emotion recognition benchmarks. Using inference-time interventions, we reveal a consistent emotion-specific signature: ablating neurons selected for a given emotion disproportionately degrades recognition of that emotion while largely preserving other classes, whereas gain-based amplification steers predictions toward the target emotion. These effects arise with modest identification data and scale systematically with intervention strength. We further observe that ESNs exhibit non-uniform layer-wise clustering with partial cross-dataset transfer. Taken together, our results offer a causal, neuron-level account of emotion decisions in LALMs and highlight targeted neuron interventions as an actionable handle for controllable affective behaviors.

Related papers

Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition [56.00118641432005]
We propose a Memory-guided Prototypical Co-occurrence Learning framework that explicitly models emotion co-occurrence patterns.<n>Inspired by human cognitive memory systems, we introduce a memory retrieval strategy to extract semantic-level co-occurrence associations.<n>Our model learns affectively informative representations for accurate emotion distribution prediction.
arXiv Detail & Related papers (2026-02-24T04:11:25Z)
Towards Interpretable Hallucination Analysis and Mitigation in LVLMs via Contrastive Neuron Steering [60.23509717784518]
Existing mitigation methods predominantly focus on output-level adjustments, leaving internal mechanisms that give rise to hallucinations largely unexplored.<n>We propose Contrastive Neuron Steering ( CNS), which identifies image-specific neurons via contrastive analysis between clean and noisy inputs.<n> CNS selectively amplifies informative neurons while suppressing perturbation-induced activations, producing more robust and semantically grounded visual representations.
arXiv Detail & Related papers (2026-01-31T09:21:04Z)
E^2-LLM: Bridging Neural Signals and Interpretable Affective Analysis [54.763420895859035]
We present ELLM2-EEG-to-Emotion Large Language Model, first MLLM framework for interpretable emotion analysis from EEG.<n>ELLM integrates a pretrained EEG encoder with Q-based LLMs through learnable projection layers, employing a multi-stage training pipeline.<n>Experiments on the dataset across seven emotion categories demonstrate that ELLM2-EEG-to-Emotion Large Language Model achieves excellent performance on emotion classification.
arXiv Detail & Related papers (2026-01-11T13:21:20Z)
Decoding Predictive Inference in Visual Language Processing via Spatiotemporal Neural Coherence [2.208251557767776]
We present a machine learning framework for decoding neural responses to visual language stimuli in Deaf signers.<n>Our results reveal distributed left-hemispheric and low-frequency coherence as key features in language comprehension.<n>This work demonstrates a novel approach for probing experience-driven generative models of perception in the brain.
arXiv Detail & Related papers (2025-12-24T04:19:20Z)
Do LLMs "Feel"? Emotion Circuits Discovery and Control [54.57583855608979]
We study the internal mechanisms that give rise to emotional expression and in controlling emotions in generated text.<n>This is the first systematic study to uncover and validate emotion circuits in large language models.
arXiv Detail & Related papers (2025-10-13T12:24:24Z)
Bridging the behavior-neural gap: A multimodal AI reveals the brain's geometry of emotion more accurately than human self-reports [18.336392633341493]
We show that large-scale similarity judgments can more faithfully capture the brain's affective geometry.<n>Our findings provide compelling evidence that MLLMs can autonomously develop rich, neurally-aligned affective representations.
arXiv Detail & Related papers (2025-09-29T05:22:33Z)
Decoding Neural Emotion Patterns through Large Language Model Embeddings [3.8032942955371785]
We propose a computational framework that maps textual emotional content to anatomically defined brain regions without requiring neuroimaging.<n>Using OpenAI's text-embedding-ada-ada, we generate high-dimensional semantic representations, apply dimensionality reduction and clustering to identify emotional groups, and map them to 18 brain regions linked to emotional processing.<n>This cost-effective, scalable approach enables large-scale analysis of naturalistic language, distinguishes between clinical populations, and offers a brain-based benchmark for evaluating AI emotional expression.
arXiv Detail & Related papers (2025-08-12T20:51:56Z)
CAST-Phys: Contactless Affective States Through Physiological signals Database [74.28082880875368]
The lack of affective multi-modal datasets remains a major bottleneck in developing accurate emotion recognition systems.<n>We present the Contactless Affective States Through Physiological Signals Database (CAST-Phys), a novel high-quality dataset capable of remote physiological emotion recognition.<n>Our analysis highlights the crucial role of physiological signals in realistic scenarios where facial expressions alone may not provide sufficient emotional information.
arXiv Detail & Related papers (2025-07-08T15:20:24Z)
Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics.<n>We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention.<n>Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks.<n>Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z)
Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation. This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions. Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z)
Emotion Analysis on EEG Signal Using Machine Learning and Neural Network [0.0]
The main purpose of this study is to improve ways to improve emotion recognition performance using brain signals. Various approaches to human-machine interaction technologies have been ongoing for a long time, and in recent years, researchers have had great success in automatically understanding emotion using brain signals.
arXiv Detail & Related papers (2023-07-09T09:50:34Z)
EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation [34.24557248359872]
We propose an emotional inertia and contagion-driven dependency modeling approach (EmotionIC) for ERC task. Our EmotionIC consists of three main components, i.e., Identity Masked Multi-Head Attention (IMMHA), Dialogue-based Gated Recurrent Unit (DiaGRU) and Skip-chain Conditional Random Field (SkipCRF) Experimental results show that our method can significantly outperform the state-of-the-art models on four benchmark datasets.
arXiv Detail & Related papers (2023-03-20T13:58:35Z)
A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts. To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules. The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.