HEU Emotion: A Large-scale Database for Multi-modal Emotion Recognition
in the Wild
- URL: http://arxiv.org/abs/2007.12519v1
- Date: Fri, 24 Jul 2020 13:36:52 GMT
- Title: HEU Emotion: A Large-scale Database for Multi-modal Emotion Recognition
in the Wild
- Authors: Jing Chen (1), Chenhui Wang (2), Kejun Wang (1), Chaoqun Yin (1), Cong
Zhao (1), Tao Xu (1), Xinyi Zhang (1), Ziqiang Huang (1), Meichen Liu (1),
Tao Yang (1) ((1) College of Intelligent Systems Science and Engineering,
Harbin Engineering University, Harbin, China., (2) UCLA Department of
Statistics, Los Angeles, CA.)
- Abstract summary: We release a new natural state video database (called HEU Emotion)
HEU Emotion contains a total of 19,004 video clips, which is divided into two parts according to the data source.
The recognition accuracies for the two parts increased by 2.19% and 4.01% respectively over those of single-modal facial expression recognition.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The study of affective computing in the wild setting is underpinned by
databases. Existing multimodal emotion databases in the real-world conditions
are few and small, with a limited number of subjects and expressed in a single
language. To meet this requirement, we collected, annotated, and prepared to
release a new natural state video database (called HEU Emotion). HEU Emotion
contains a total of 19,004 video clips, which is divided into two parts
according to the data source. The first part contains videos downloaded from
Tumblr, Google, and Giphy, including 10 emotions and two modalities (facial
expression and body posture). The second part includes corpus taken manually
from movies, TV series, and variety shows, consisting of 10 emotions and three
modalities (facial expression, body posture, and emotional speech). HEU Emotion
is by far the most extensive multi-modal emotional database with 9,951
subjects. In order to provide a benchmark for emotion recognition, we used many
conventional machine learning and deep learning methods to evaluate HEU
Emotion. We proposed a Multi-modal Attention module to fuse multi-modal
features adaptively. After multi-modal fusion, the recognition accuracies for
the two parts increased by 2.19% and 4.01% respectively over those of
single-modal facial expression recognition.
Related papers
- Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis [6.387263468033964]
We introduce a self-reviewed dataset and a human-reviewed dataset, comprising 24,137 coarse-grained samples and 3,500 manually annotated samples with detailed emotion annotations.
In addition to the audio modeling, we propose to explicitly integrate facial encoding models into the existing advanced Video MLLM.
Our Omni-Emotion achieves state-of-the-art performance in both emotion recognition and reasoning tasks.
arXiv Detail & Related papers (2025-01-16T12:27:05Z) - MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation [39.30784838378127]
The generation of talking avatars has achieved significant advancements in precise audio synchronization.
Current methods face fundamental challenges, including the lack of frameworks for modeling single basic emotional expressions.
We propose the Mixture of Emotion Experts (MoEE) model, which decouples six fundamental emotions to enable the precise synthesis of both singular and compound emotional states.
In conjunction with the DH-FaceEmoVid-150 dataset, we demonstrate that the MoEE framework excels in generating complex emotional expressions and nuanced facial details.
arXiv Detail & Related papers (2025-01-03T13:43:21Z) - Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
Multimodal Sentiment Analysis seeks to unravel human emotions by amalgamating text, audio, and visual data.
Yet, discerning subtle emotional nuances within audio and video expressions poses a formidable challenge.
We introduce DEVA, a progressive fusion framework founded on textual sentiment descriptions.
arXiv Detail & Related papers (2024-12-12T11:30:41Z) - MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions.
Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones.
Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z) - Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model [5.301672905886949]
This report introduces the solution of using MLLMs technology to generate open-vocabulary emotion labels from a video.
In the MER-OV (Open-Word Emotion Recognition) of the MER2024 challenge, our method achieved significant advantages, leading to its superior capabilities in complex emotion computation.
arXiv Detail & Related papers (2024-08-21T02:17:18Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - MAFW: A Large-scale, Multi-modal, Compound Affective Database for
Dynamic Facial Expression Recognition in the Wild [56.61912265155151]
We propose MAFW, a large-scale compound affective database with 10,045 video-audio clips in the wild.
Each clip is annotated with a compound emotional category and a couple of sentences that describe the subjects' affective behaviors in the clip.
For the compound emotion annotation, each clip is categorized into one or more of the 11 widely-used emotions, i.e., anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment.
arXiv Detail & Related papers (2022-08-01T13:34:33Z) - M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database [139.08528216461502]
We propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED.
M3ED contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances.
To the best of our knowledge, M3ED is the first multimodal emotional dialogue dataset in Chinese.
arXiv Detail & Related papers (2022-05-09T06:52:51Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.