Related papers: Toward a Realistic Encoding Model of Auditory Affective Understanding in the Brain

Toward a Realistic Encoding Model of Auditory Affective Understanding in the Brain

URL: http://arxiv.org/abs/2509.21381v1
Date: Tue, 23 Sep 2025 14:52:11 GMT
Title: Toward a Realistic Encoding Model of Auditory Affective Understanding in the Brain
Authors: Guandong Pan, Yaqian Yang, Shi Chen, Xin Wang, Longzhao Liu, Hongwei Zheng, Shaoting Tang,
Abstract summary: In affective neuroscience and emotion-aware AI, understanding how complex auditory stimuli drive emotion arousal dynamics remains unresolved.<n>This study introduces a computational framework to model the brain's encoding of naturalistic auditory inputs into dynamic behavioral/neural responses.<n>By integrating affective computing and neuroscience, this work uncovers hierarchical mechanisms of auditory-emotion encoding.
Score: 5.168772989709122
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In affective neuroscience and emotion-aware AI, understanding how complex auditory stimuli drive emotion arousal dynamics remains unresolved. This study introduces a computational framework to model the brain's encoding of naturalistic auditory inputs into dynamic behavioral/neural responses across three datasets (SEED, LIRIS, self-collected BAVE). Guided by neurobiological principles of parallel auditory hierarchy, we decompose audio into multilevel auditory features (through classical algorithms and wav2vec 2.0/Hubert) from the original and isolated human voice/background soundtrack elements, mapping them to emotion-related responses via cross-dataset analyses. Our analysis reveals that high-level semantic representations (derived from the final layer of wav2vec 2.0/Hubert) exert a dominant role in emotion encoding, outperforming low-level acoustic features with significantly stronger mappings to behavioral annotations and dynamic neural synchrony across most brain regions ($p < 0.05$). Notably, middle layers of wav2vec 2.0/hubert (balancing acoustic-semantic information) surpass the final layers in emotion induction across datasets. Moreover, human voices and soundtracks show dataset-dependent emotion-evoking biases aligned with stimulus energy distribution (e.g., LIRIS favors soundtracks due to higher background energy), with neural analyses indicating voices dominate prefrontal/temporal activity while soundtracks excel in limbic regions. By integrating affective computing and neuroscience, this work uncovers hierarchical mechanisms of auditory-emotion encoding, providing a foundation for adaptive emotion-aware systems and cross-disciplinary explorations of audio-affective interactions.

Related papers

A Convolutional Framework for Mapping Imagined Auditory MEG into Listened Brain Responses [0.0]
We present a Magnetoencephalography (MEG) dataset collected from trained musicians as they imagined and listened to musical and poetic stimuli.<n>We show that both imagined and perceived brain responses contain consistent, condition-specific information.
arXiv Detail & Related papers (2025-12-03T05:23:10Z)
SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning [54.390403684665834]
Deciphering how visual stimuli are transformed into cortical responses is a fundamental challenge in computational neuroscience.<n>We propose SynBrain, a generative framework that simulates the transformation from visual semantics to neural responses in a probabilistic and biologically interpretable manner.<n> Experimental results demonstrate that SynBrain surpasses state-of-the-art methods in subject-specific visual-to-fMRI encoding performance.
arXiv Detail & Related papers (2025-08-14T03:01:05Z)
Decoding Neural Emotion Patterns through Large Language Model Embeddings [3.8032942955371785]
We propose a computational framework that maps textual emotional content to anatomically defined brain regions without requiring neuroimaging.<n>Using OpenAI's text-embedding-ada-ada, we generate high-dimensional semantic representations, apply dimensionality reduction and clustering to identify emotional groups, and map them to 18 brain regions linked to emotional processing.<n>This cost-effective, scalable approach enables large-scale analysis of naturalistic language, distinguishes between clinical populations, and offers a brain-based benchmark for evaluating AI emotional expression.
arXiv Detail & Related papers (2025-08-12T20:51:56Z)
Neural Brain: A Neuroscience-inspired Framework for Embodied Agents [58.58177409853298]
Current AI systems, such as large language models, remain disembodied, unable to physically engage with the world.<n>At the core of this challenge lies the concept of Neural Brain, a central intelligence system designed to drive embodied agents with human-like adaptability.<n>This paper introduces a unified framework for the Neural Brain of embodied agents, addressing two fundamental challenges.
arXiv Detail & Related papers (2025-05-12T15:05:34Z)
Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation [59.81482518924723]
We propose a method for capturing and generating subtle shifts for talking-head generation. We develop a talking-head framework that is capable of generating a variety of emotions with precise control over intensity levels. Experiments and analyses validate the effectiveness of our proposed method.
arXiv Detail & Related papers (2024-09-29T01:02:01Z)
R&B -- Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity [0.12289361708127873]
Music is a universal phenomenon that profoundly influences human experiences across cultures. This study investigates whether music can be decoded from human brain activity measured with functional MRI (fMRI) during its perception.
arXiv Detail & Related papers (2024-06-21T17:11:45Z)
Music Emotion Prediction Using Recurrent Neural Networks [8.867897390286815]
This study aims to enhance music recommendation systems and support therapeutic interventions by tailoring music to fit listeners' emotional states. We utilize Russell's Emotion Quadrant to categorize music into four distinct emotional regions and develop models capable of accurately predicting these categories. Our approach involves extracting a comprehensive set of audio features using Librosa and applying various recurrent neural network architectures, including standard RNNs, Bidirectional RNNs, and Long Short-Term Memory (LSTM) networks.
arXiv Detail & Related papers (2024-05-10T18:03:20Z)
Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks. We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network. Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z)
A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts. To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules. The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z)
Toward a realistic model of speech processing in the brain with self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate. We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z)
Enhancing Affective Representations of Music-Induced EEG through Multimodal Supervision and latent Domain Adaptation [34.726185927120355]
We employ music signals as a supervisory modality to EEG, aiming to project their semantic correspondence onto a common representation space. We utilize a bi-modal framework by combining an LSTM-based attention model to process EEG and a pre-trained model for music tagging, along with a reverse domain discriminator to align the distributions of the two modalities. The resulting framework can be utilized for emotion recognition both directly, by performing supervised predictions from either modality, and indirectly, by providing relevant music samples to EEG input queries.
arXiv Detail & Related papers (2022-02-20T07:32:12Z)
Emotion Recognition of the Singing Voice: Toward a Real-Time Analysis Tool for Singers [0.0]
Current computational-emotion research has focused on applying acoustic properties to analyze how emotions are perceived mathematically. This paper seeks to reflect and expand upon the findings of related research and present a stepping-stone toward this end goal.
arXiv Detail & Related papers (2021-05-01T05:47:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.