Interpretable Multimodal Emotion Recognition using Facial Features and
Physiological Signals
- URL: http://arxiv.org/abs/2306.02845v1
- Date: Mon, 5 Jun 2023 12:57:07 GMT
- Title: Interpretable Multimodal Emotion Recognition using Facial Features and
Physiological Signals
- Authors: Puneet Kumar and Xiaobai Li
- Abstract summary: It introduces a multimodal framework for emotion understanding by fusing information from visual facial features and r signals extracted from the input videos.
An interpretability technique based on permutation importance analysis has also been implemented.
- Score: 16.549488750320336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper aims to demonstrate the importance and feasibility of fusing
multimodal information for emotion recognition. It introduces a multimodal
framework for emotion understanding by fusing the information from visual
facial features and rPPG signals extracted from the input videos. An
interpretability technique based on permutation feature importance analysis has
also been implemented to compute the contributions of rPPG and visual
modalities toward classifying a given input video into a particular emotion
class. The experiments on IEMOCAP dataset demonstrate that the emotion
classification performance improves by combining the complementary information
from multiple modalities.
Related papers
- PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning [116.33775552866476]
Generalized zero-shot learning (GZSL) endeavors to identify the unseen using knowledge from the seen domain.
GZSL suffers from insufficient visual-semantic correspondences due to attribute diversity and instance diversity.
We propose a multi-granularity progressive semantic-visual adaption network, where sufficient visual elements can be gathered to remedy the inconsistency.
arXiv Detail & Related papers (2024-10-15T12:49:33Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive Learning for Multimodal Emotion Recognition [14.639340916340801]
We propose a novel Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive for Multimodal Emotion Recognition (AR-IIGCN) method.
Firstly, we input video, audio, and text features into a multi-layer perceptron (MLP) to map them into separate feature spaces.
Secondly, we build a generator and a discriminator for the three modal features through adversarial representation.
Thirdly, we introduce contrastive graph representation learning to capture intra-modal and inter-modal complementary semantic information.
arXiv Detail & Related papers (2023-12-28T01:57:26Z) - EMERSK -- Explainable Multimodal Emotion Recognition with Situational
Knowledge [0.0]
We present Explainable Multimodal Emotion Recognition with Situational Knowledge (EMERSK)
EMERSK is a general system for human emotion recognition and explanation using visual information.
Our system can handle multiple modalities, including facial expressions, posture, and gait in a flexible and modular manner.
arXiv Detail & Related papers (2023-06-14T17:52:37Z) - Interpretable Multimodal Emotion Recognition using Hybrid Fusion of
Speech and Image Data [15.676632465869346]
A new interpretability technique has been developed to identify the important speech & image features leading to the prediction of particular emotion classes.
The proposed system has achieved 83.29% accuracy for emotion recognition.
arXiv Detail & Related papers (2022-08-25T04:43:34Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Multi-modal Text Recognition Networks: Interactive Enhancements between
Visual and Semantic Features [11.48760300147023]
This paper introduces a novel method, called Multi-Almod Text Recognition Network (MATRN)
MATRN identifies visual and semantic feature pairs and encodes spatial information into semantic features.
Our experiments demonstrate that MATRN achieves state-of-the-art performances on seven benchmarks with large margins.
arXiv Detail & Related papers (2021-11-30T10:22:11Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Attentive Cross-modal Connections for Deep Multimodal Wearable-based
Emotion Recognition [7.559720049837459]
We present a novel attentive cross-modal connection to share information between convolutional neural networks.
Specifically, these connections improve emotion classification by sharing intermediate representations among EDA and ECG.
Our experiments show that the proposed approach is capable of learning strong multimodal representations and outperforms a number of baselines methods.
arXiv Detail & Related papers (2021-08-04T18:40:32Z) - Emotion pattern detection on facial videos using functional statistics [62.997667081978825]
We propose a technique based on Functional ANOVA to extract significant patterns of face muscles movements.
We determine if there are time-related differences on expressions among emotional groups by using a functional F-test.
arXiv Detail & Related papers (2021-03-01T08:31:08Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.