Related papers: AffectNet+: A Database for Enhancing Facial Expression Recognition with Soft-Labels

AffectNet+: A Database for Enhancing Facial Expression Recognition with Soft-Labels

URL: http://arxiv.org/abs/2410.22506v1
Date: Tue, 29 Oct 2024 19:57:10 GMT
Title: AffectNet+: A Database for Enhancing Facial Expression Recognition with Soft-Labels
Authors: Ali Pourramezan Fard, Mohammad Mehdi Hosseini, Timothy D. Sweeny, Mohammad H. Mahoor,
Abstract summary: We propose a new approach to create FER datasets through a labeling method in which an image is labeled with more than one emotion. Finding smoother decision boundaries, enabling multi-labeling, and mitigating bias and imbalanced data are some of the advantages of our proposed method. Building upon AffectNet, we introduce AffectNet+, the next-generation facial expression dataset.
Score: 2.644902054473556
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated Facial Expression Recognition (FER) is challenging due to intra-class variations and inter-class similarities. FER can be especially difficult when facial expressions reflect a mixture of various emotions (aka compound expressions). Existing FER datasets, such as AffectNet, provide discrete emotion labels (hard-labels), where a single category of emotion is assigned to an expression. To alleviate inter- and intra-class challenges, as well as provide a better facial expression descriptor, we propose a new approach to create FER datasets through a labeling method in which an image is labeled with more than one emotion (called soft-labels), each with different confidences. Specifically, we introduce the notion of soft-labels for facial expression datasets, a new approach to affective computing for more realistic recognition of facial expressions. To achieve this goal, we propose a novel methodology to accurately calculate soft-labels: a vector representing the extent to which multiple categories of emotion are simultaneously present within a single facial expression. Finding smoother decision boundaries, enabling multi-labeling, and mitigating bias and imbalanced data are some of the advantages of our proposed method. Building upon AffectNet, we introduce AffectNet+, the next-generation facial expression dataset. This dataset contains soft-labels, three categories of data complexity subsets, and additional metadata such as age, gender, ethnicity, head pose, facial landmarks, valence, and arousal. AffectNet+ will be made publicly accessible to researchers.

Related papers

Enhancing Ambiguous Dynamic Facial Expression Recognition with Soft Label-based Data Augmentation [11.08736594484412]
We propose MIDAS, a data augmentation method designed to enhance DFER performance for ambiguous facial expression data.<n>MIDAS augments training data by convexly combining pairs of video frames and their corresponding emotion class labels.<n>Results show that models trained with MIDAS-augmented data achieve superior performance compared to the state-of-the-art method trained on the original dataset.
arXiv Detail & Related papers (2025-06-25T22:36:42Z)
MIDAS: Mixing Ambiguous Data with Soft Labels for Dynamic Facial Expression Recognition [11.89503569570198]
We propose MIDAS, a data augmentation method for dynamic facial expression recognition (DFER) In MIDAS, the training data are augmented by convexly combining pairs of video frames and their corresponding emotion class labels. The results demonstrate that the model trained on the data augmented by MIDAS outperforms the existing state-of-the-art method trained on the original dataset.
arXiv Detail & Related papers (2025-02-28T21:39:19Z)
Knowledge-Enhanced Facial Expression Recognition with Emotional-to-Neutral Transformation [66.53435569574135]
Existing facial expression recognition methods typically fine-tune a pre-trained visual encoder using discrete labels. We observe that the rich knowledge in text embeddings, generated by vision-language models, is a promising alternative for learning discriminative facial expression representations. We propose a novel knowledge-enhanced FER method with an emotional-to-neutral transformation.
arXiv Detail & Related papers (2024-09-13T07:28:57Z)
CAGE: Circumplex Affect Guided Expression Inference [9.108319009019912]
We present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect. We propose a model for the prediction of facial expressions tailored for lightweight applications.
arXiv Detail & Related papers (2024-04-23T12:30:17Z)
VISTANet: VIsual Spoken Textual Additive Net for Interpretable Multimodal Emotion Recognition [21.247650660908484]
This paper proposes a multimodal emotion recognition system, VIsual Textual Additive Net (VISTANet) The VISTANet fuses information from image, speech, and text modalities using a hybrid of early and late fusion. The KAAP technique computes the contribution of each modality and corresponding features toward predicting a particular emotion class.
arXiv Detail & Related papers (2022-08-24T11:35:51Z)
MAFW: A Large-scale, Multi-modal, Compound Affective Database for Dynamic Facial Expression Recognition in the Wild [56.61912265155151]
We propose MAFW, a large-scale compound affective database with 10,045 video-audio clips in the wild. Each clip is annotated with a compound emotional category and a couple of sentences that describe the subjects' affective behaviors in the clip. For the compound emotion annotation, each clip is categorized into one or more of the 11 widely-used emotions, i.e., anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment.
arXiv Detail & Related papers (2022-08-01T13:34:33Z)
Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers [57.1091606948826]
We propose a novel FER model, named Poker Face Vision Transformer or PF-ViT, to address these challenges. PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face. PF-ViT utilizes vanilla Vision Transformers, and its components are pre-trained as Masked Autoencoders on a large facial expression dataset.
arXiv Detail & Related papers (2022-07-22T13:39:06Z)
Label Distribution Amendment with Emotional Semantic Correlations for Facial Expression Recognition [69.18918567657757]
We propose a new method that amends the label distribution of each facial image by leveraging correlations among expressions in the semantic space. By comparing semantic and task class-relation graphs of each image, the confidence of its label distribution is evaluated. Experimental results demonstrate the proposed method is more effective than compared state-of-the-art methods.
arXiv Detail & Related papers (2021-07-23T07:46:14Z)
Interpretable Image Emotion Recognition: A Domain Adaptation Approach Using Facial Expressions [11.808447247077902]
This paper proposes a feature-based domain adaptation technique for identifying emotions in generic images. It addresses the challenge of the limited availability of pre-trained models and well-annotated datasets for Image Emotion Recognition (IER) The proposed IER system demonstrated emotion classification accuracies of 60.98% for the IAPSa dataset, 58.86% for the ArtPhoto dataset, 69.13% for the FI dataset, and 58.06% for the EMOTIC dataset.
arXiv Detail & Related papers (2020-11-17T02:55:16Z)
Facial Emotion Recognition with Noisy Multi-task Annotations [88.42023952684052]
We introduce a new problem of facial emotion recognition with noisy multi-task annotations. For this new problem, we suggest a formulation from the point of joint distribution match view. We exploit a new method to enable the emotion prediction and the joint distribution learning.
arXiv Detail & Related papers (2020-10-19T20:39:37Z)
Learning to Augment Expressions for Few-shot Fine-grained Facial Expression Recognition [98.83578105374535]
We present a novel Fine-grained Facial Expression Database - F2ED. It includes more than 200k images with 54 facial expressions from 119 persons. Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we evaluate several tasks of few-shot expression learning. We propose a unified task-driven framework - Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize facial images.
arXiv Detail & Related papers (2020-01-17T03:26:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.