Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS Fusion
- URL: http://arxiv.org/abs/2508.04723v1
- Date: Tue, 05 Aug 2025 12:25:35 GMT
- Title: Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS Fusion
- Authors: Sha Zhao, Song Yi, Yangxuan Zhou, Jiadong Pan, Jiquan Wang, Jie Xia, Shijian Li, Shurong Dong, Gang Pan,
- Abstract summary: MEEtBrain is a portable and multimodal framework for emotion analysis (valence/arousal)<n>It integrates AI-generated music stimuli with EEG-fNIRS acquisition via a wireless headband.<n>A 14-hour dataset from 20 participants was collected to validate the framework's efficacy.
- Score: 11.122272456519227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotions critically influence mental health, driving interest in music-based affective computing via neurophysiological signals with Brain-computer Interface techniques. While prior studies leverage music's accessibility for emotion induction, three key limitations persist: \textbf{(1) Stimulus Constraints}: Music stimuli are confined to small corpora due to copyright and curation costs, with selection biases from heuristic emotion-music mappings that ignore individual affective profiles. \textbf{(2) Modality Specificity}: Overreliance on unimodal neural data (e.g., EEG) ignores complementary insights from cross-modal signal fusion.\textbf{ (3) Portability Limitation}: Cumbersome setups (e.g., 64+ channel gel-based EEG caps) hinder real-world applicability due to procedural complexity and portability barriers. To address these limitations, we propose MEEtBrain, a portable and multimodal framework for emotion analysis (valence/arousal), integrating AI-generated music stimuli with synchronized EEG-fNIRS acquisition via a wireless headband. By MEEtBrain, the music stimuli can be automatically generated by AI on a large scale, eliminating subjective selection biases while ensuring music diversity. We use our developed portable device that is designed in a lightweight headband-style and uses dry electrodes, to simultaneously collect EEG and fNIRS recordings. A 14-hour dataset from 20 participants was collected in the first recruitment to validate the framework's efficacy, with AI-generated music eliciting target emotions (valence/arousal). We are actively expanding our multimodal dataset (44 participants in the latest dataset) and make it publicly available to promote further research and practical applications. \textbf{The dataset is available at https://zju-bmi-lab.github.io/ZBra.
Related papers
- CAST-Phys: Contactless Affective States Through Physiological signals Database [74.28082880875368]
The lack of affective multi-modal datasets remains a major bottleneck in developing accurate emotion recognition systems.<n>We present the Contactless Affective States Through Physiological Signals Database (CAST-Phys), a novel high-quality dataset capable of remote physiological emotion recognition.<n>Our analysis highlights the crucial role of physiological signals in realistic scenarios where facial expressions alone may not provide sufficient emotional information.
arXiv Detail & Related papers (2025-07-08T15:20:24Z) - Neural Brain: A Neuroscience-inspired Framework for Embodied Agents [58.58177409853298]
Current AI systems, such as large language models, remain disembodied, unable to physically engage with the world.<n>At the core of this challenge lies the concept of Neural Brain, a central intelligence system designed to drive embodied agents with human-like adaptability.<n>This paper introduces a unified framework for the Neural Brain of embodied agents, addressing two fundamental challenges.
arXiv Detail & Related papers (2025-05-12T15:05:34Z) - Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics.<n>We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention.<n>Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks.<n>Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z) - MEEG and AT-DGNN: Improving EEG Emotion Recognition with Music Introducing and Graph-based Learning [3.840859750115109]
We present the MEEG dataset, a multi-modal collection of music-induced electroencephalogram (EEG) recordings.
We introduce the Attention-based Temporal Learner with Dynamic Graph Neural Network (AT-DGNN), a novel framework for EEG-based emotion recognition.
arXiv Detail & Related papers (2024-07-08T01:58:48Z) - R&B -- Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity [0.12289361708127873]
Music is a universal phenomenon that profoundly influences human experiences across cultures.
This study investigates whether music can be decoded from human brain activity measured with functional MRI (fMRI) during its perception.
arXiv Detail & Related papers (2024-06-21T17:11:45Z) - Emotion-aware Personalized Music Recommendation with a Heterogeneity-aware Deep Bayesian Network [8.844728473984766]
We propose a Heterogeneity-aware Deep Bayesian Network (HDBN) to model these assumptions.<n>The HDBN mimics a user's decision process of choosing music with four components.<n>Our method significantly outperforms baseline approaches on metrics of HR, Precision, NDCG, and MRR.
arXiv Detail & Related papers (2024-06-20T08:12:11Z) - Music Emotion Prediction Using Recurrent Neural Networks [8.867897390286815]
This study aims to enhance music recommendation systems and support therapeutic interventions by tailoring music to fit listeners' emotional states.
We utilize Russell's Emotion Quadrant to categorize music into four distinct emotional regions and develop models capable of accurately predicting these categories.
Our approach involves extracting a comprehensive set of audio features using Librosa and applying various recurrent neural network architectures, including standard RNNs, Bidirectional RNNs, and Long Short-Term Memory (LSTM) networks.
arXiv Detail & Related papers (2024-05-10T18:03:20Z) - fMRI from EEG is only Deep Learning away: the use of interpretable DL to
unravel EEG-fMRI relationships [68.8204255655161]
We present an interpretable domain grounded solution to recover the activity of several subcortical regions from multichannel EEG data.
We recover individual spatial and time-frequency patterns of scalp EEG predictive of the hemodynamic signal in the subcortical nuclei.
arXiv Detail & Related papers (2022-10-23T15:11:37Z) - Enhancing Affective Representations of Music-Induced EEG through
Multimodal Supervision and latent Domain Adaptation [34.726185927120355]
We employ music signals as a supervisory modality to EEG, aiming to project their semantic correspondence onto a common representation space.
We utilize a bi-modal framework by combining an LSTM-based attention model to process EEG and a pre-trained model for music tagging, along with a reverse domain discriminator to align the distributions of the two modalities.
The resulting framework can be utilized for emotion recognition both directly, by performing supervised predictions from either modality, and indirectly, by providing relevant music samples to EEG input queries.
arXiv Detail & Related papers (2022-02-20T07:32:12Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - A Efficient Multimodal Framework for Large Scale Emotion Recognition by
Fusing Music and Electrodermal Activity Signals [8.338268870275877]
We propose an end-to-end multimodal framework, the 1-dimensional residual temporal and channel attention network (RTCAN-1D)
For EDA features, the novel convex optimization-based EDA (CvxEDA) method is applied to decompose EDA signals into pahsic and tonic signals.
For music features, we process the music signal with the open source toolkit openSMILE to obtain external feature vectors.
arXiv Detail & Related papers (2020-08-22T03:13:20Z) - An End-to-End Visual-Audio Attention Network for Emotion Recognition in
User-Generated Videos [64.91614454412257]
We propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs)
Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN.
arXiv Detail & Related papers (2020-02-12T15:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.