Representation Learning and Identity Adversarial Training for Facial Behavior Understanding
- URL: http://arxiv.org/abs/2407.11243v1
- Date: Mon, 15 Jul 2024 21:13:28 GMT
- Title: Representation Learning and Identity Adversarial Training for Facial Behavior Understanding
- Authors: Mang Ning, Albert Ali Salah, Itir Onal Ertugrul,
- Abstract summary: We show that subject identity provides a shortcut learning for the model and leads to sub-optimal solutions to AU predictions.
We propose Identity Adrial Training (IAT) and demonstrate that a strong IAT regularization is necessary to learn identity-invariant features.
Our proposed methods, Facial Masked Autoencoder (FMAE) and IAT, are simple, generic and effective.
- Score: 3.350769246260559
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial Action Unit (AU) detection has gained significant research attention as AUs contain complex expression information. In this paper, we unpack two fundamental factors in AU detection: data and subject identity regularization, respectively. Motivated by recent advances in foundation models, we highlight the importance of data and collect a diverse dataset Face9M, comprising 9 million facial images, from multiple public resources. Pretraining a masked autoencoder on Face9M yields strong performance in AU detection and facial expression tasks. We then show that subject identity in AU datasets provides a shortcut learning for the model and leads to sub-optimal solutions to AU predictions. To tackle this generic issue of AU tasks, we propose Identity Adversarial Training (IAT) and demonstrate that a strong IAT regularization is necessary to learn identity-invariant features. Furthermore, we elucidate the design space of IAT and empirically show that IAT circumvents the identity shortcut learning and results in a better solution. Our proposed methods, Facial Masked Autoencoder (FMAE) and IAT, are simple, generic and effective. Remarkably, the proposed FMAE-IAT approach achieves new state-of-the-art F1 scores on BP4D (67.1\%), BP4D+ (66.8\%), and DISFA (70.1\%) databases, significantly outperforming previous work. We release the code and model at https://github.com/forever208/FMAE-IAT, the first open-sourced facial model pretrained on 9 million diverse images.
Related papers
- Federated Learning for Face Recognition via Intra-subject Self-supervised Learning [3.9899461012388504]
We propose FedFS (Federated Learning for personalized Face recognition via intra-subject Self-supervised learning framework) to train personalized face recognition models without imposing subjects.
FedFS comprises two crucial components that leverage aggregated features of the local and global models to cooperate with representations of an off-the-shelf model.
We conduct comprehensive experiments on the DigiFace-1M and VGGFace datasets, demonstrating superior performance compared to previous methods.
arXiv Detail & Related papers (2024-07-23T08:43:42Z) - Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition [1.4374467687356276]
This paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification.
We suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model.
The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset.
arXiv Detail & Related papers (2024-03-19T16:21:47Z) - Contrastive Learning of Person-independent Representations for Facial
Action Unit Detection [70.60587475492065]
We formulate the self-supervised AU representation learning signals in two-fold.
We contrast learn the AU representation within a video clip and devise a cross-identity reconstruction mechanism to learn the person-independent representations.
Our method outperforms other contrastive learning methods and significantly closes the performance gap between the self-supervised and supervised AU detection approaches.
arXiv Detail & Related papers (2024-03-06T01:49:28Z) - Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person
Re-identification [78.08536797239893]
We propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules.
MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips.
We show that MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.
arXiv Detail & Related papers (2023-01-02T05:17:31Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - AU-Supervised Convolutional Vision Transformers for Synthetic Facial
Expression Recognition [12.661683851729679]
The paper describes our proposed methodology for the six basic expression classification track of Affective Behavior Analysis in-the-wild (ABAW) Competition 2022.
Because of the ambiguous of the synthetic data and the objectivity of the facial Action Unit (AU), we resort to the AU information for performance boosting.
arXiv Detail & Related papers (2022-07-20T09:33:39Z) - Cross-subject Action Unit Detection with Meta Learning and
Transformer-based Relation Modeling [7.395396464857193]
The paper proposes a meta-learning-based cross-subject AU detection model to eliminate the identity-caused differences.
A transformer-based relation learning module is introduced to learn the latent relations of multiple AUs.
Our results prove that on the two public datasets BP4D and DISFA, our method is superior to the state-of-the-art technology.
arXiv Detail & Related papers (2022-05-18T08:17:59Z) - Learning Multi-dimensional Edge Feature-based AU Relation Graph for
Facial Action Unit Recognition [27.34564955127377]
The activations of Facial Action Units (AUs) mutually influence one another.
Existing approaches fail to specifically and explicitly represent such cues for each pair of AUs in each facial display.
This paper proposes an AU relationship modelling approach that deep learns a unique graph to explicitly describe the relationship between each pair of AUs.
arXiv Detail & Related papers (2022-05-02T03:38:00Z) - Deep Multi-task Multi-label CNN for Effective Facial Attribute
Classification [53.58763562421771]
We propose a novel deep multi-task multi-label CNN, termed DMM-CNN, for effective Facial Attribute Classification (FAC)
Specifically, DMM-CNN jointly optimize two closely-related tasks (i.e., facial landmark detection and FAC) to improve the performance of FAC by taking advantage of multi-task learning.
Two different network architectures are respectively designed to extract features for two groups of attributes, and a novel dynamic weighting scheme is proposed to automatically assign the loss weight to each facial attribute during training.
arXiv Detail & Related papers (2020-02-10T12:34:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.