Boosting Facial Action Unit Detection Through Jointly Learning Facial
Landmark Detection and Domain Separation and Reconstruction
- URL: http://arxiv.org/abs/2310.05207v5
- Date: Mon, 4 Mar 2024 12:42:07 GMT
- Title: Boosting Facial Action Unit Detection Through Jointly Learning Facial
Landmark Detection and Domain Separation and Reconstruction
- Authors: Ziqiao Shang, Li Yu
- Abstract summary: We propose a new AU detection framework where multi-task learning is introduced to jointly learn AU domain separation and reconstruction and facial landmark detection.
We also propose a new feature alignment scheme based on contrastive learning by simple projectors and an improved contrastive loss.
- Score: 4.4150617622399055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently how to introduce large amounts of unlabeled facial images in the
wild into supervised Facial Action Unit (AU) detection frameworks has become a
challenging problem. In this paper, we propose a new AU detection framework
where multi-task learning is introduced to jointly learn AU domain separation
and reconstruction and facial landmark detection by sharing the parameters of
homostructural facial extraction modules. In addition, we propose a new feature
alignment scheme based on contrastive learning by simple projectors and an
improved contrastive loss, which adds four additional intermediate supervisors
to promote the feature reconstruction process. Experimental results on two
benchmarks demonstrate our superiority against the state-of-the-art methods for
AU detection in the wild.
Related papers
- Decoupled Doubly Contrastive Learning for Cross Domain Facial Action Unit Detection [66.80386429324196]
We propose a decoupled doubly contrastive adaptation (D$2$CA) approach to learn a purified AU representation.
D$2$CA is trained to disentangle AU and domain factors by assessing the quality of synthesized faces.
It consistently outperforms state-of-the-art cross-domain AU detection approaches.
arXiv Detail & Related papers (2025-03-12T00:42:17Z) - Context Enhancement with Reconstruction as Sequence for Unified Unsupervised Anomaly Detection [68.74469657656822]
Unsupervised anomaly detection (AD) aims to train robust detection models using only normal samples.
Recent research focuses on a unified unsupervised AD setting in which only one model is trained for all classes.
We introduce a novel Reconstruction as Sequence (RAS) method, which enhances the contextual correspondence during feature reconstruction.
arXiv Detail & Related papers (2024-09-10T07:37:58Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - DFMSD: Dual Feature Masking Stage-wise Knowledge Distillation for Object Detection [6.371066478190595]
A novel dual feature-masking heterogeneous distillation framework termed DFMSD is proposed for object detection.
A masking enhancement strategy is combined with stage-wise learning to improve feature-masking reconstruction.
Experiments for the object detection task demonstrate the promise of our approach.
arXiv Detail & Related papers (2024-07-18T04:19:14Z) - Contrastive Learning of Person-independent Representations for Facial
Action Unit Detection [70.60587475492065]
We formulate the self-supervised AU representation learning signals in two-fold.
We contrast learn the AU representation within a video clip and devise a cross-identity reconstruction mechanism to learn the person-independent representations.
Our method outperforms other contrastive learning methods and significantly closes the performance gap between the self-supervised and supervised AU detection approaches.
arXiv Detail & Related papers (2024-03-06T01:49:28Z) - Weakly Supervised Regional and Temporal Learning for Facial Action Unit
Recognition [36.350407471391065]
We propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance.
A single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles.
By incorporating semi-supervised learning, we propose an end-to-end trainable framework named weakly supervised regional and temporal learning.
arXiv Detail & Related papers (2022-04-01T12:02:01Z) - Dual Contrastive Learning for General Face Forgery Detection [64.41970626226221]
We propose a novel face forgery detection framework, named Dual Contrastive Learning (DCL), which constructs positive and negative paired data.
To explore the essential discrepancies, Intra-Instance Contrastive Learning (Intra-ICL) is introduced to focus on the local content inconsistencies prevalent in the forged faces.
arXiv Detail & Related papers (2021-12-27T05:44:40Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - View-Invariant Gait Recognition with Attentive Recurrent Learning of
Partial Representations [27.33579145744285]
We propose a network that first learns to extract gait convolutional energy maps (GCEM) from frame-level convolutional features.
It then adopts a bidirectional neural network to learn from split bins of the GCEM, thus exploiting the relations between learned partial recurrent representations.
Our proposed model has been extensively tested on two large-scale CASIA-B and OU-M gait datasets.
arXiv Detail & Related papers (2020-10-18T20:20:43Z) - Gait Recognition using Multi-Scale Partial Representation Transformation
with Capsules [22.99694601595627]
We propose a novel deep network, learning to transfer multi-scale partial gait representations using capsules.
Our network first obtains multi-scale partial representations using a state-of-the-art deep partial feature extractor.
It then recurrently learns the correlations and co-occurrences of the patterns among the partial features in forward and backward directions.
arXiv Detail & Related papers (2020-10-18T19:47:38Z) - J$\hat{\text{A}}$A-Net: Joint Facial Action Unit Detection and Face
Alignment via Adaptive Attention [57.51255553918323]
We propose a novel end-to-end deep learning framework for joint AU detection and face alignment.
Our framework significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks.
arXiv Detail & Related papers (2020-03-18T12:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.