Interpretable Multimodal Emotion Recognition using Facial Features and
Physiological Signals
- URL: http://arxiv.org/abs/2306.02845v1
- Date: Mon, 5 Jun 2023 12:57:07 GMT
- Title: Interpretable Multimodal Emotion Recognition using Facial Features and
Physiological Signals
- Authors: Puneet Kumar and Xiaobai Li
- Abstract summary: It introduces a multimodal framework for emotion understanding by fusing information from visual facial features and r signals extracted from the input videos.
An interpretability technique based on permutation importance analysis has also been implemented.
- Score: 16.549488750320336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper aims to demonstrate the importance and feasibility of fusing
multimodal information for emotion recognition. It introduces a multimodal
framework for emotion understanding by fusing the information from visual
facial features and rPPG signals extracted from the input videos. An
interpretability technique based on permutation feature importance analysis has
also been implemented to compute the contributions of rPPG and visual
modalities toward classifying a given input video into a particular emotion
class. The experiments on IEMOCAP dataset demonstrate that the emotion
classification performance improves by combining the complementary information
from multiple modalities.
Related papers
- Milmer: a Framework for Multiple Instance Learning based Multimodal Emotion Recognition [16.616341358877243]
This study addresses the challenges of emotion recognition by integrating facial expression analysis with electroencephalogram (EEG) signals.
The proposed framework employs a transformer-based fusion approach to effectively integrate visual and physiological modalities.
A key innovation of this work is the adoption of a multiple instance learning (MIL) approach, which extracts meaningful information from multiple facial expression images.
arXiv Detail & Related papers (2025-02-01T20:32:57Z) - Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
Multimodal Sentiment Analysis seeks to unravel human emotions by amalgamating text, audio, and visual data.
Yet, discerning subtle emotional nuances within audio and video expressions poses a formidable challenge.
We introduce DEVA, a progressive fusion framework founded on textual sentiment descriptions.
arXiv Detail & Related papers (2024-12-12T11:30:41Z) - PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning [116.33775552866476]
Generalized zero-shot learning (GZSL) endeavors to identify the unseen using knowledge from the seen domain.
GZSL suffers from insufficient visual-semantic correspondences due to attribute diversity and instance diversity.
We propose a multi-granularity progressive semantic-visual adaption network, where sufficient visual elements can be gathered to remedy the inconsistency.
arXiv Detail & Related papers (2024-10-15T12:49:33Z) - Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive Learning for Multimodal Emotion Recognition [14.639340916340801]
We propose a novel Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive for Multimodal Emotion Recognition (AR-IIGCN) method.
Firstly, we input video, audio, and text features into a multi-layer perceptron (MLP) to map them into separate feature spaces.
Secondly, we build a generator and a discriminator for the three modal features through adversarial representation.
Thirdly, we introduce contrastive graph representation learning to capture intra-modal and inter-modal complementary semantic information.
arXiv Detail & Related papers (2023-12-28T01:57:26Z) - EMERSK -- Explainable Multimodal Emotion Recognition with Situational
Knowledge [0.0]
We present Explainable Multimodal Emotion Recognition with Situational Knowledge (EMERSK)
EMERSK is a general system for human emotion recognition and explanation using visual information.
Our system can handle multiple modalities, including facial expressions, posture, and gait in a flexible and modular manner.
arXiv Detail & Related papers (2023-06-14T17:52:37Z) - Interpretable Multimodal Emotion Recognition using Hybrid Fusion of
Speech and Image Data [15.676632465869346]
A new interpretability technique has been developed to identify the important speech & image features leading to the prediction of particular emotion classes.
The proposed system has achieved 83.29% accuracy for emotion recognition.
arXiv Detail & Related papers (2022-08-25T04:43:34Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Multi-modal Text Recognition Networks: Interactive Enhancements between
Visual and Semantic Features [11.48760300147023]
This paper introduces a novel method, called Multi-Almod Text Recognition Network (MATRN)
MATRN identifies visual and semantic feature pairs and encodes spatial information into semantic features.
Our experiments demonstrate that MATRN achieves state-of-the-art performances on seven benchmarks with large margins.
arXiv Detail & Related papers (2021-11-30T10:22:11Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Emotion pattern detection on facial videos using functional statistics [62.997667081978825]
We propose a technique based on Functional ANOVA to extract significant patterns of face muscles movements.
We determine if there are time-related differences on expressions among emotional groups by using a functional F-test.
arXiv Detail & Related papers (2021-03-01T08:31:08Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.