HEU Emotion: A Large-scale Database for Multi-modal Emotion Recognition
in the Wild
- URL: http://arxiv.org/abs/2007.12519v1
- Date: Fri, 24 Jul 2020 13:36:52 GMT
- Title: HEU Emotion: A Large-scale Database for Multi-modal Emotion Recognition
in the Wild
- Authors: Jing Chen (1), Chenhui Wang (2), Kejun Wang (1), Chaoqun Yin (1), Cong
Zhao (1), Tao Xu (1), Xinyi Zhang (1), Ziqiang Huang (1), Meichen Liu (1),
Tao Yang (1) ((1) College of Intelligent Systems Science and Engineering,
Harbin Engineering University, Harbin, China., (2) UCLA Department of
Statistics, Los Angeles, CA.)
- Abstract summary: We release a new natural state video database (called HEU Emotion)
HEU Emotion contains a total of 19,004 video clips, which is divided into two parts according to the data source.
The recognition accuracies for the two parts increased by 2.19% and 4.01% respectively over those of single-modal facial expression recognition.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The study of affective computing in the wild setting is underpinned by
databases. Existing multimodal emotion databases in the real-world conditions
are few and small, with a limited number of subjects and expressed in a single
language. To meet this requirement, we collected, annotated, and prepared to
release a new natural state video database (called HEU Emotion). HEU Emotion
contains a total of 19,004 video clips, which is divided into two parts
according to the data source. The first part contains videos downloaded from
Tumblr, Google, and Giphy, including 10 emotions and two modalities (facial
expression and body posture). The second part includes corpus taken manually
from movies, TV series, and variety shows, consisting of 10 emotions and three
modalities (facial expression, body posture, and emotional speech). HEU Emotion
is by far the most extensive multi-modal emotional database with 9,951
subjects. In order to provide a benchmark for emotion recognition, we used many
conventional machine learning and deep learning methods to evaluate HEU
Emotion. We proposed a Multi-modal Attention module to fuse multi-modal
features adaptively. After multi-modal fusion, the recognition accuracies for
the two parts increased by 2.19% and 4.01% respectively over those of
single-modal facial expression recognition.
Related papers
- MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions.
Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones.
Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z) - Generative Emotion Cause Explanation in Multimodal Conversations [23.39751445330256]
We propose a new task, textbfMultimodal textbfConversation textbfEmotion textbfCause textbfExplanation (MCECE)
It aims to generate a detailed explanation of the emotional cause to the target utterance within a multimodal conversation scenario.
A novel approach, FAME-Net, is proposed, that harnesses the power of Large Language Models (LLMs) to analyze visual data and accurately interpret the emotions conveyed through facial expressions in videos.
arXiv Detail & Related papers (2024-11-01T09:16:30Z) - Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model [5.301672905886949]
This report introduces the solution of using MLLMs technology to generate open-vocabulary emotion labels from a video.
In the MER-OV (Open-Word Emotion Recognition) of the MER2024 challenge, our method achieved significant advantages, leading to its superior capabilities in complex emotion computation.
arXiv Detail & Related papers (2024-08-21T02:17:18Z) - Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset [74.74686464187474]
Emotion and Intent Joint Understanding in Multimodal Conversation (MC-EIU) aims to decode the semantic information manifested in a multimodal conversational history.
MC-EIU is enabling technology for many human-computer interfaces.
We propose an MC-EIU dataset, which features 7 emotion categories, 9 intent categories, 3 modalities, i.e., textual, acoustic, and visual content, and two languages, English and Mandarin.
arXiv Detail & Related papers (2024-07-03T01:56:00Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model [22.292581935835678]
We construct a dataset for Emotion Analysis in Long-sequential and De-identity videos called EALD.
We also provide the Non-Facial Body Language (NFBL) annotations for each player.
NFBL is an inner-driven emotional expression and can serve as an identity-free clue to understanding the emotional state.
arXiv Detail & Related papers (2024-05-01T15:25:54Z) - MAFW: A Large-scale, Multi-modal, Compound Affective Database for
Dynamic Facial Expression Recognition in the Wild [56.61912265155151]
We propose MAFW, a large-scale compound affective database with 10,045 video-audio clips in the wild.
Each clip is annotated with a compound emotional category and a couple of sentences that describe the subjects' affective behaviors in the clip.
For the compound emotion annotation, each clip is categorized into one or more of the 11 widely-used emotions, i.e., anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment.
arXiv Detail & Related papers (2022-08-01T13:34:33Z) - M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database [139.08528216461502]
We propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED.
M3ED contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances.
To the best of our knowledge, M3ED is the first multimodal emotional dialogue dataset in Chinese.
arXiv Detail & Related papers (2022-05-09T06:52:51Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.