MAFW: A Large-scale, Multi-modal, Compound Affective Database for
Dynamic Facial Expression Recognition in the Wild
- URL: http://arxiv.org/abs/2208.00847v2
- Date: Mon, 14 Aug 2023 05:22:41 GMT
- Title: MAFW: A Large-scale, Multi-modal, Compound Affective Database for
Dynamic Facial Expression Recognition in the Wild
- Authors: Yuanyuan Liu, Wei Dai, Chuanxu Feng, Wenbin Wang, Guanghao Yin, Jiabei
Zeng and Shiguang Shan
- Abstract summary: We propose MAFW, a large-scale compound affective database with 10,045 video-audio clips in the wild.
Each clip is annotated with a compound emotional category and a couple of sentences that describe the subjects' affective behaviors in the clip.
For the compound emotion annotation, each clip is categorized into one or more of the 11 widely-used emotions, i.e., anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment.
- Score: 56.61912265155151
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dynamic facial expression recognition (FER) databases provide important data
support for affective computing and applications. However, most FER databases
are annotated with several basic mutually exclusive emotional categories and
contain only one modality, e.g., videos. The monotonous labels and modality
cannot accurately imitate human emotions and fulfill applications in the real
world. In this paper, we propose MAFW, a large-scale multi-modal compound
affective database with 10,045 video-audio clips in the wild. Each clip is
annotated with a compound emotional category and a couple of sentences that
describe the subjects' affective behaviors in the clip. For the compound
emotion annotation, each clip is categorized into one or more of the 11
widely-used emotions, i.e., anger, disgust, fear, happiness, neutral, sadness,
surprise, contempt, anxiety, helplessness, and disappointment. To ensure high
quality of the labels, we filter out the unreliable annotations by an
Expectation Maximization (EM) algorithm, and then obtain 11 single-label
emotion categories and 32 multi-label emotion categories. To the best of our
knowledge, MAFW is the first in-the-wild multi-modal database annotated with
compound emotion annotations and emotion-related captions. Additionally, we
also propose a novel Transformer-based expression snippet feature learning
method to recognize the compound emotions leveraging the expression-change
relations among different emotions and modalities. Extensive experiments on
MAFW database show the advantages of the proposed method over other
state-of-the-art methods for both uni- and multi-modal FER. Our MAFW database
is publicly available from https://mafw-database.github.io/MAFW.
Related papers
- AffectNet+: A Database for Enhancing Facial Expression Recognition with Soft-Labels [2.644902054473556]
We propose a new approach to create FER datasets through a labeling method in which an image is labeled with more than one emotion.
Finding smoother decision boundaries, enabling multi-labeling, and mitigating bias and imbalanced data are some of the advantages of our proposed method.
Building upon AffectNet, we introduce AffectNet+, the next-generation facial expression dataset.
arXiv Detail & Related papers (2024-10-29T19:57:10Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - Learning Emotion Representations from Verbal and Nonverbal Communication [7.747924294389427]
We present EmotionCLIP, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication.
We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning.
EmotionCLIP will address the prevailing issue of data scarcity in emotion understanding, thereby fostering progress in related domains.
arXiv Detail & Related papers (2023-05-22T21:36:55Z) - When Facial Expression Recognition Meets Few-Shot Learning: A Joint and
Alternate Learning Framework [60.51225419301642]
We propose an Emotion Guided Similarity Network (EGS-Net) to address the diversity of human emotions in practical scenarios.
EGS-Net consists of an emotion branch and a similarity branch, based on a two-stage learning framework.
Experimental results on both in-the-lab and in-the-wild compound expression datasets demonstrate the superiority of our proposed method against several state-of-the-art methods.
arXiv Detail & Related papers (2022-01-18T07:24:12Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - Recognizing Emotions evoked by Movies using Multitask Learning [3.4290619267487488]
Methods for recognizing evoked emotions are usually trained on human annotated data.
We propose two deep learning architectures: a Single-Task (ST) architecture and a Multi-Task (MT) architecture.
Our results show that the MT approach can more accurately model each viewer and the aggregated annotation when compared to methods that are directly trained on the aggregated annotations.
arXiv Detail & Related papers (2021-07-30T10:21:40Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z) - RAF-AU Database: In-the-Wild Facial Expressions with Subjective Emotion
Judgement and Objective AU Annotations [36.93475723886278]
We develop a RAF-AU database that employs a sign-based (i.e., AUs) and judgement-based (i.e., perceived emotion) approach to annotating blended facial expressions in the wild.
We also conduct a preliminary investigation of which key AUs contribute most to a perceived emotion, and the relationship between AUs and facial expressions.
arXiv Detail & Related papers (2020-08-12T09:29:16Z) - HEU Emotion: A Large-scale Database for Multi-modal Emotion Recognition
in the Wild [0.0]
We release a new natural state video database (called HEU Emotion)
HEU Emotion contains a total of 19,004 video clips, which is divided into two parts according to the data source.
The recognition accuracies for the two parts increased by 2.19% and 4.01% respectively over those of single-modal facial expression recognition.
arXiv Detail & Related papers (2020-07-24T13:36:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.