Enhancing Ambiguous Dynamic Facial Expression Recognition with Soft Label-based Data Augmentation
- URL: http://arxiv.org/abs/2506.20867v1
- Date: Wed, 25 Jun 2025 22:36:42 GMT
- Title: Enhancing Ambiguous Dynamic Facial Expression Recognition with Soft Label-based Data Augmentation
- Authors: Ryosuke Kawamura, Hideaki Hayashi, Shunsuke Otake, Noriko Takemura, Hajime Nagahara,
- Abstract summary: We propose MIDAS, a data augmentation method designed to enhance DFER performance for ambiguous facial expression data.<n>MIDAS augments training data by convexly combining pairs of video frames and their corresponding emotion class labels.<n>Results show that models trained with MIDAS-augmented data achieve superior performance compared to the state-of-the-art method trained on the original dataset.
- Score: 11.08736594484412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dynamic facial expression recognition (DFER) is a task that estimates emotions from facial expression video sequences. For practical applications, accurately recognizing ambiguous facial expressions -- frequently encountered in in-the-wild data -- is essential. In this study, we propose MIDAS, a data augmentation method designed to enhance DFER performance for ambiguous facial expression data using soft labels representing probabilities of multiple emotion classes. MIDAS augments training data by convexly combining pairs of video frames and their corresponding emotion class labels. This approach extends mixup to soft-labeled video data, offering a simple yet highly effective method for handling ambiguity in DFER. To evaluate MIDAS, we conducted experiments on both the DFEW dataset and FERV39k-Plus, a newly constructed dataset that assigns soft labels to an existing DFER dataset. The results demonstrate that models trained with MIDAS-augmented data achieve superior performance compared to the state-of-the-art method trained on the original dataset.
Related papers
- MIDAS: Mixing Ambiguous Data with Soft Labels for Dynamic Facial Expression Recognition [11.89503569570198]
We propose MIDAS, a data augmentation method for dynamic facial expression recognition (DFER)<n>In MIDAS, the training data are augmented by convexly combining pairs of video frames and their corresponding emotion class labels.<n>The results demonstrate that the model trained on the data augmented by MIDAS outperforms the existing state-of-the-art method trained on the original dataset.
arXiv Detail & Related papers (2025-02-28T21:39:19Z) - AffectNet+: A Database for Enhancing Facial Expression Recognition with Soft-Labels [2.644902054473556]
We propose a new approach to create FER datasets through a labeling method in which an image is labeled with more than one emotion.
Finding smoother decision boundaries, enabling multi-labeling, and mitigating bias and imbalanced data are some of the advantages of our proposed method.
Building upon AffectNet, we introduce AffectNet+, the next-generation facial expression dataset.
arXiv Detail & Related papers (2024-10-29T19:57:10Z) - Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data [83.48170683672427]
We propose a unified dual-modal learning framework that integrates SFER data as a complementary resource for DFER.<n>S4D employs dual-modal self-supervised pre-training on facial images and videos using a shared Transformer (ViT) encoder-decoder architecture.<n>Experiments demonstrate that S4D achieves a deeper understanding of DFER, setting new state-of-the-art performance.
arXiv Detail & Related papers (2024-09-10T01:57:57Z) - From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations.
We introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features.
arXiv Detail & Related papers (2023-12-09T03:16:09Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.