Mutilmodal Feature Extraction and Attention-based Fusion for Emotion
Estimation in Videos
- URL: http://arxiv.org/abs/2303.10421v1
- Date: Sat, 18 Mar 2023 14:08:06 GMT
- Title: Mutilmodal Feature Extraction and Attention-based Fusion for Emotion
Estimation in Videos
- Authors: Tao Shu, Xinke Wang, Ruotong Wang, Chuang Chen, Yixin Zhang, Xiao Sun
- Abstract summary: We introduce our submission to the CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW)
We exploited multimodal features extracted from video of different lengths from the competition dataset, including audio, pose and images.
Our system achieves the performance of 0.361 on the validation dataset.
- Score: 16.28109151595872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The continuous improvement of human-computer interaction technology makes it
possible to compute emotions. In this paper, we introduce our submission to the
CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW).
Sentiment analysis in human-computer interaction should, as far as possible
Start with multiple dimensions, fill in the single imperfect emotion channel,
and finally determine the emotion tendency by fitting multiple results.
Therefore, We exploited multimodal features extracted from video of different
lengths from the competition dataset, including audio, pose and images.
Well-informed emotion representations drive us to propose a Attention-based
multimodal framework for emotion estimation. Our system achieves the
performance of 0.361 on the validation dataset. The code is available at
[https://github.com/xkwangcn/ABAW-5th-RT-IAI].
Related papers
- Multimodal Group Emotion Recognition In-the-wild Using Privacy-Compliant
Features [0.0]
Group-level emotion recognition can be useful in many fields including social robotics, conversational agents, e-coaching and learning analytics.
This paper explores privacy-compliant group-level emotion recognition ''in-the-wild'' within the EmotiW Challenge 2023.
arXiv Detail & Related papers (2023-12-06T08:58:11Z) - Computer Vision Estimation of Emotion Reaction Intensity in the Wild [1.5481864635049696]
We describe our submission to the newly introduced Emotional Reaction Intensity (ERI) Estimation challenge.
We developed four deep neural networks trained in the visual domain and a multimodal model trained with both visual and audio features to predict emotion reaction intensity.
arXiv Detail & Related papers (2023-03-19T19:09:41Z) - FAF: A novel multimodal emotion recognition approach integrating face,
body and text [13.485538135494153]
We develop a large multimodal emotion dataset, named "HED" dataset, to facilitate the emotion recognition task.
To promote recognition accuracy, "Feature After Feature" framework was used to explore crucial emotional information.
We employ various benchmarks to evaluate the "HED" dataset and compare the performance with our method.
arXiv Detail & Related papers (2022-11-20T14:43:36Z) - Seeking Subjectivity in Visual Emotion Distribution Learning [93.96205258496697]
Visual Emotion Analysis (VEA) aims to predict people's emotions towards different visual stimuli.
Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process.
We propose a novel textitSubjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution.
arXiv Detail & Related papers (2022-07-25T02:20:03Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Affect2MM: Affective Analysis of Multimedia Content Using Emotion
Causality [84.69595956853908]
We present Affect2MM, a learning method for time-series emotion prediction for multimedia content.
Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors.
arXiv Detail & Related papers (2021-03-11T09:07:25Z) - Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion
Recognition [62.48806555665122]
We describe our approaches in EmotiW 2019, which mainly explores emotion features and feature fusion strategies for audio and visual modality.
With careful evaluation, we obtain 65.5% on the AFEW validation set and 62.48% on the test set and rank third in the challenge.
arXiv Detail & Related papers (2020-12-27T10:50:24Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z) - Temporal aggregation of audio-visual modalities for emotion recognition [0.5352699766206808]
We propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality.
Our proposed method outperforms other methods from the literature and human accuracy rating.
arXiv Detail & Related papers (2020-07-08T18:44:15Z) - Context Based Emotion Recognition using EMOTIC Dataset [22.631542327834595]
We present EMOTIC, a dataset of images of people annotated with their apparent emotion.
Using the EMOTIC dataset we train different CNN models for emotion recognition.
Our results show how scene context provides important information to automatically recognize emotional states.
arXiv Detail & Related papers (2020-03-30T12:38:50Z) - EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's
Principle [71.47160118286226]
We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images.
Motivated by Frege's Context Principle from psychology, our approach combines three interpretations of context for emotion recognition.
We report an Average Precision (AP) score of 35.48 across 26 classes, which is an improvement of 7-8 over prior methods.
arXiv Detail & Related papers (2020-03-14T19:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.