A Efficient Multimodal Framework for Large Scale Emotion Recognition by
Fusing Music and Electrodermal Activity Signals
- URL: http://arxiv.org/abs/2008.09743v2
- Date: Thu, 2 Dec 2021 03:04:51 GMT
- Title: A Efficient Multimodal Framework for Large Scale Emotion Recognition by
Fusing Music and Electrodermal Activity Signals
- Authors: Guanghao Yin, Shouqian Sun, Dian Yu, Dejian Li and Kejun Zhang
- Abstract summary: We propose an end-to-end multimodal framework, the 1-dimensional residual temporal and channel attention network (RTCAN-1D)
For EDA features, the novel convex optimization-based EDA (CvxEDA) method is applied to decompose EDA signals into pahsic and tonic signals.
For music features, we process the music signal with the open source toolkit openSMILE to obtain external feature vectors.
- Score: 8.338268870275877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Considerable attention has been paid for physiological signal-based emotion
recognition in field of affective computing. For the reliability and user
friendly acquisition, Electrodermal Activity (EDA) has great advantage in
practical applications. However, the EDA-based emotion recognition with
hundreds of subjects still lacks effective solution. In this paper, our work
makes an attempt to fuse the subject individual EDA features and the external
evoked music features. And we propose an end-to-end multimodal framework, the
1-dimensional residual temporal and channel attention network (RTCAN-1D). For
EDA features, the novel convex optimization-based EDA (CvxEDA) method is
applied to decompose EDA signals into pahsic and tonic signals for mining the
dynamic and steady features. The channel-temporal attention mechanism for
EDA-based emotion recognition is firstly involved to improve the temporal- and
channel-wise representation. For music features, we process the music signal
with the open source toolkit openSMILE to obtain external feature vectors. The
individual emotion features from EDA signals and external emotion benchmarks
from music are fused in the classifing layers. We have conducted systematic
comparisons on three multimodal datasets (PMEmo, DEAP, AMIGOS) for 2-classes
valance/arousal emotion recognition. Our proposed RTCAN-1D outperforms the
existing state-of-the-art models, which also validate that our work provides an
reliable and efficient solution for large scale emotion recognition. Our code
has been released at https://github.com/guanghaoyin/RTCAN-1D.
Related papers
- Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS Fusion [11.122272456519227]
MEEtBrain is a portable and multimodal framework for emotion analysis (valence/arousal)<n>It integrates AI-generated music stimuli with EEG-fNIRS acquisition via a wireless headband.<n>A 14-hour dataset from 20 participants was collected to validate the framework's efficacy.
arXiv Detail & Related papers (2025-08-05T12:25:35Z) - Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics.
We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention.
Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks.
Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z) - Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition [15.077653455298707]
This article presents our results for the eighth Affective Behavior Analysis in-the-wild (ABAW) competition.
We propose a multimodal emotion recognition method that fuses the features of Vision Transformer (ViT) and Residual Network (ResNet)
The results show that in scenarios with complex visual and audio cues, the model that fuses the features of ViT and ResNet exhibits superior performance.
arXiv Detail & Related papers (2025-03-21T18:03:44Z) - Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors [63.194053817609024]
We introduce eye behaviors as an important emotional cues for the creation of a new Eye-behavior-aided Multimodal Emotion Recognition dataset.
For the first time, we provide annotations for both Emotion Recognition (ER) and Facial Expression Recognition (FER) in the EMER dataset.
We specifically design a new EMERT architecture to concurrently enhance performance in both ER and FER.
arXiv Detail & Related papers (2024-11-08T04:53:55Z) - Emotion-Agent: Unsupervised Deep Reinforcement Learning with Distribution-Prototype Reward for Continuous Emotional EEG Analysis [2.1645626994550664]
Continuous electroencephalography (EEG) signals are widely used in affective brain-computer interface (aBCI) applications.
We propose a novel unsupervised deep reinforcement learning framework, called Emotion-Agent, to automatically identify relevant and informative emotional moments from EEG signals.
Emotion-Agent is trained using Proximal Policy Optimization (PPO) to achieve stable and efficient convergence.
arXiv Detail & Related papers (2024-08-22T04:29:25Z) - EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving [64.58258341591929]
Auditory Referring Multi-Object Tracking (AR-MOT) is a challenging problem in autonomous driving.
We put forward EchoTrack, an end-to-end AR-MOT framework with dual-stream vision transformers.
We establish the first set of large-scale AR-MOT benchmarks.
arXiv Detail & Related papers (2024-02-28T12:50:16Z) - DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial
Attention Detection [49.196182908826565]
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment.
Current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images.
This paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input.
arXiv Detail & Related papers (2023-09-07T13:43:46Z) - M2FNet: Multi-modal Fusion Network for Emotion Recognition in
Conversation [1.3864478040954673]
We propose a Multi-modal Fusion Network (M2FNet) that extracts emotion-relevant features from visual, audio, and text modality.
It employs a multi-head attention-based fusion mechanism to combine emotion-rich latent representations of the input data.
The proposed feature extractor is trained with a novel adaptive margin-based triplet loss function to learn emotion-relevant features from the audio and visual data.
arXiv Detail & Related papers (2022-06-05T14:18:58Z) - Enhancing Affective Representations of Music-Induced EEG through
Multimodal Supervision and latent Domain Adaptation [34.726185927120355]
We employ music signals as a supervisory modality to EEG, aiming to project their semantic correspondence onto a common representation space.
We utilize a bi-modal framework by combining an LSTM-based attention model to process EEG and a pre-trained model for music tagging, along with a reverse domain discriminator to align the distributions of the two modalities.
The resulting framework can be utilized for emotion recognition both directly, by performing supervised predictions from either modality, and indirectly, by providing relevant music samples to EEG input queries.
arXiv Detail & Related papers (2022-02-20T07:32:12Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - Investigating EEG-Based Functional Connectivity Patterns for Multimodal
Emotion Recognition [8.356765961526955]
We investigate three functional connectivity network features: strength, clustering, coefficient and eigenvector centrality.
The discrimination ability of the EEG connectivity features in emotion recognition is evaluated on three public EEG datasets.
We construct a multimodal emotion recognition model by combining the functional connectivity features from EEG and the features from eye movements or physiological signals.
arXiv Detail & Related papers (2020-04-04T16:51:56Z) - An End-to-End Visual-Audio Attention Network for Emotion Recognition in
User-Generated Videos [64.91614454412257]
We propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs)
Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN.
arXiv Detail & Related papers (2020-02-12T15:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.