FAF: A novel multimodal emotion recognition approach integrating face,
body and text
- URL: http://arxiv.org/abs/2211.15425v1
- Date: Sun, 20 Nov 2022 14:43:36 GMT
- Title: FAF: A novel multimodal emotion recognition approach integrating face,
body and text
- Authors: Zhongyu Fang, Aoyun He, Qihui Yu, Baopeng Gao, Weiping Ding, Tong
Zhang, Lei Ma
- Abstract summary: We develop a large multimodal emotion dataset, named "HED" dataset, to facilitate the emotion recognition task.
To promote recognition accuracy, "Feature After Feature" framework was used to explore crucial emotional information.
We employ various benchmarks to evaluate the "HED" dataset and compare the performance with our method.
- Score: 13.485538135494153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal emotion analysis performed better in emotion recognition depending
on more comprehensive emotional clues and multimodal emotion dataset. In this
paper, we developed a large multimodal emotion dataset, named "HED" dataset, to
facilitate the emotion recognition task, and accordingly propose a multimodal
emotion recognition method. To promote recognition accuracy, "Feature After
Feature" framework was used to explore crucial emotional information from the
aligned face, body and text samples. We employ various benchmarks to evaluate
the "HED" dataset and compare the performance with our method. The results show
that the five classification accuracy of the proposed multimodal fusion method
is about 83.75%, and the performance is improved by 1.83%, 9.38%, and 21.62%
respectively compared with that of individual modalities. The complementarity
between each channel is effectively used to improve the performance of emotion
recognition. We had also established a multimodal online emotion prediction
platform, aiming to provide free emotion prediction to more users.
Related papers
- GCM-Net: Graph-enhanced Cross-Modal Infusion with a Metaheuristic-Driven Network for Video Sentiment and Emotion Analysis [2.012311338995539]
This paper presents a novel framework that leverages the multi-modal contextual information from utterances and applies metaheuristic algorithms to learn for utterance-level sentiment and emotion prediction.
To show the effectiveness of our approach, we have conducted extensive evaluations on three prominent multimodal benchmark datasets.
arXiv Detail & Related papers (2024-10-02T10:07:48Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning [55.127202990679976]
We introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories.
This dataset enables models to learn from varied scenarios and generalize to real-world applications.
We propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders.
arXiv Detail & Related papers (2024-06-17T03:01:22Z) - Deep Imbalanced Learning for Multimodal Emotion Recognition in
Conversations [15.705757672984662]
Multimodal Emotion Recognition in Conversations (MERC) is a significant development direction for machine intelligence.
Many data in MERC naturally exhibit an imbalanced distribution of emotion categories, and researchers ignore the negative impact of imbalanced data on emotion recognition.
We propose the Class Boundary Enhanced Representation Learning (CBERL) model to address the imbalanced distribution of emotion categories in raw data.
We have conducted extensive experiments on the IEMOCAP and MELD benchmark datasets, and the results show that CBERL has achieved a certain performance improvement in the effectiveness of emotion recognition.
arXiv Detail & Related papers (2023-12-11T12:35:17Z) - An Empirical Study and Improvement for Speech Emotion Recognition [22.250228893114066]
Multimodal speech emotion recognition aims to detect speakers' emotions from audio and text.
In this work, we consider a simple yet important problem: how to fuse audio and text modality information.
Empirical results show our method obtained new state-of-the-art results on the IEMOCAP dataset.
arXiv Detail & Related papers (2023-04-08T03:24:06Z) - Mutilmodal Feature Extraction and Attention-based Fusion for Emotion
Estimation in Videos [16.28109151595872]
We introduce our submission to the CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW)
We exploited multimodal features extracted from video of different lengths from the competition dataset, including audio, pose and images.
Our system achieves the performance of 0.361 on the validation dataset.
arXiv Detail & Related papers (2023-03-18T14:08:06Z) - Seeking Subjectivity in Visual Emotion Distribution Learning [93.96205258496697]
Visual Emotion Analysis (VEA) aims to predict people's emotions towards different visual stimuli.
Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process.
We propose a novel textitSubjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution.
arXiv Detail & Related papers (2022-07-25T02:20:03Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion
Recognition [62.48806555665122]
We describe our approaches in EmotiW 2019, which mainly explores emotion features and feature fusion strategies for audio and visual modality.
With careful evaluation, we obtain 65.5% on the AFEW validation set and 62.48% on the test set and rank third in the challenge.
arXiv Detail & Related papers (2020-12-27T10:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.