Related papers: Representation Learning and Identity Adversarial Training for Facial Behavior Understanding

Representation Learning and Identity Adversarial Training for Facial Behavior Understanding

URL: http://arxiv.org/abs/2407.11243v2
Date: Thu, 08 May 2025 18:07:28 GMT
Title: Representation Learning and Identity Adversarial Training for Facial Behavior Understanding
Authors: Mang Ning, Albert Ali Salah, Itir Onal Ertugrul,
Abstract summary: We revisit two fundamental factors in AU detection: diverse and large-scale data and subject identity regularization.<n>Pretraining a masked autoencoder on Face9M yields strong performance in AU detection and facial expression tasks.<n>Our proposed methods, Facial Masked Autoencoder (FMAE) and IAT, are simple, generic and effective.
Score: 3.350769246260559
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Facial Action Unit (AU) detection has gained significant attention as it enables the breakdown of complex facial expressions into individual muscle movements. In this paper, we revisit two fundamental factors in AU detection: diverse and large-scale data and subject identity regularization. Motivated by recent advances in foundation models, we highlight the importance of data and introduce Face9M, a diverse dataset comprising 9 million facial images from multiple public sources. Pretraining a masked autoencoder on Face9M yields strong performance in AU detection and facial expression tasks. More importantly, we emphasize that the Identity Adversarial Training (IAT) has not been well explored in AU tasks. To fill this gap, we first show that subject identity in AU datasets creates shortcut learning for the model and leads to sub-optimal solutions to AU predictions. Secondly, we demonstrate that strong IAT regularization is necessary to learn identity-invariant features. Finally, we elucidate the design space of IAT and empirically show that IAT circumvents the identity-based shortcut learning and results in a better solution. Our proposed methods, Facial Masked Autoencoder (FMAE) and IAT, are simple, generic and effective. Remarkably, the proposed FMAE-IAT approach achieves new state-of-the-art F1 scores on BP4D (67.1\%), BP4D+ (66.8\%), and DISFA (70.1\%) databases, significantly outperforming previous work. We release the code and model at https://github.com/forever208/FMAE-IAT.

Related papers

Action Unit Enhance Dynamic Facial Expression Recognition [7.142118694918976]
We propose an AU-enhanced Dynamic Facial Expression Recognition architecture, AU-DFER, to enhance the effectiveness of deep learning modeling.<n>The contribution of the Action Units(AUs) to different expressions is quantified, and a weight matrix is designed to incorporate a priori knowledge.<n>Experiments are conducted on three recent mainstream open-source approaches to DFER on the principal datasets in this field.
arXiv Detail & Related papers (2025-07-10T11:59:43Z)
Federated Learning for Face Recognition via Intra-subject Self-supervised Learning [3.9899461012388504]
We propose FedFS (Federated Learning for personalized Face recognition via intra-subject Self-supervised learning framework) to train personalized face recognition models without imposing subjects. FedFS comprises two crucial components that leverage aggregated features of the local and global models to cooperate with representations of an off-the-shelf model. We conduct comprehensive experiments on the DigiFace-1M and VGGFace datasets, demonstrating superior performance compared to previous methods.
arXiv Detail & Related papers (2024-07-23T08:43:42Z)
Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension. First, the model self-constructs a preference for image descriptions using unlabeled images. To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z)
Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition [1.4374467687356276]
This paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification. We suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model. The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset.
arXiv Detail & Related papers (2024-03-19T16:21:47Z)
Contrastive Learning of Person-independent Representations for Facial Action Unit Detection [70.60587475492065]
We formulate the self-supervised AU representation learning signals in two-fold. We contrast learn the AU representation within a video clip and devise a cross-identity reconstruction mechanism to learn the person-independent representations. Our method outperforms other contrastive learning methods and significantly closes the performance gap between the self-supervised and supervised AU detection approaches.
arXiv Detail & Related papers (2024-03-06T01:49:28Z)
Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification [78.08536797239893]
We propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules. MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips. We show that MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.
arXiv Detail & Related papers (2023-01-02T05:17:31Z)
Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning. We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z)
FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping [62.38898610210771]
We present a new single-stage method for subject face swapping and identity transfer, named FaceDancer. We have two major contributions: Adaptive Feature Fusion Attention (AFFA) and Interpreted Feature Similarity Regularization (IFSR)
arXiv Detail & Related papers (2022-10-19T11:31:38Z)
CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets. CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z)
AU-Supervised Convolutional Vision Transformers for Synthetic Facial Expression Recognition [12.661683851729679]
The paper describes our proposed methodology for the six basic expression classification track of Affective Behavior Analysis in-the-wild (ABAW) Competition 2022. Because of the ambiguous of the synthetic data and the objectivity of the facial Action Unit (AU), we resort to the AU information for performance boosting.
arXiv Detail & Related papers (2022-07-20T09:33:39Z)
Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part. We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge. Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z)
Cross-subject Action Unit Detection with Meta Learning and Transformer-based Relation Modeling [7.395396464857193]
The paper proposes a meta-learning-based cross-subject AU detection model to eliminate the identity-caused differences. A transformer-based relation learning module is introduced to learn the latent relations of multiple AUs. Our results prove that on the two public datasets BP4D and DISFA, our method is superior to the state-of-the-art technology.
arXiv Detail & Related papers (2022-05-18T08:17:59Z)
Learning Multi-dimensional Edge Feature-based AU Relation Graph for Facial Action Unit Recognition [27.34564955127377]
The activations of Facial Action Units (AUs) mutually influence one another. Existing approaches fail to specifically and explicitly represent such cues for each pair of AUs in each facial display. This paper proposes an AU relationship modelling approach that deep learns a unique graph to explicitly describe the relationship between each pair of AUs.
arXiv Detail & Related papers (2022-05-02T03:38:00Z)
Deep Multi-task Multi-label CNN for Effective Facial Attribute Classification [53.58763562421771]
We propose a novel deep multi-task multi-label CNN, termed DMM-CNN, for effective Facial Attribute Classification (FAC) Specifically, DMM-CNN jointly optimize two closely-related tasks (i.e., facial landmark detection and FAC) to improve the performance of FAC by taking advantage of multi-task learning. Two different network architectures are respectively designed to extract features for two groups of attributes, and a novel dynamic weighting scheme is proposed to automatically assign the loss weight to each facial attribute during training.
arXiv Detail & Related papers (2020-02-10T12:34:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.