Self-supervised Facial Action Unit Detection with Region and Relation
Learning
- URL: http://arxiv.org/abs/2303.05708v1
- Date: Fri, 10 Mar 2023 05:22:45 GMT
- Title: Self-supervised Facial Action Unit Detection with Region and Relation
Learning
- Authors: Juan Song and Zhilei Liu
- Abstract summary: We propose a novel self-supervised framework for AU detection with the region and relation learning.
An improved Optimal Transport (OT) algorithm is introduced to exploit the correlation characteristics among AUs.
Swin Transformer is exploited to model the long-distance dependencies within each AU region during feature learning.
- Score: 5.182661263082065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial action unit (AU) detection is a challenging task due to the scarcity
of manual annotations. Recent works on AU detection with self-supervised
learning have emerged to address this problem, aiming to learn meaningful AU
representations from numerous unlabeled data. However, most existing AU
detection works with self-supervised learning utilize global facial features
only, while AU-related properties such as locality and relevance are not fully
explored. In this paper, we propose a novel self-supervised framework for AU
detection with the region and relation learning. In particular, AU related
attention map is utilized to guide the model to focus more on AU-specific
regions to enhance the integrity of AU local features. Meanwhile, an improved
Optimal Transport (OT) algorithm is introduced to exploit the correlation
characteristics among AUs. In addition, Swin Transformer is exploited to model
the long-distance dependencies within each AU region during feature learning.
The evaluation results on BP4D and DISFA demonstrate that our proposed method
is comparable or even superior to the state-of-the-art self-supervised learning
methods and supervised AU detection methods.
Related papers
- Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding Sample [53.23474626420103]
Facial action unit (AU) detection remains a challenging task, due to the subtlety, dynamics, and diversity of AUs.
We propose a novel AU detection framework called AC2D by adaptively constraining self-attention weight distribution.
Our method achieves competitive performance compared to state-of-the-art AU detection approaches on challenging benchmarks.
arXiv Detail & Related papers (2024-10-02T05:51:24Z) - Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning [48.70249675019288]
We propose an end-to-end Vision-Language joint learning network for explainable facial action units (AUs) recognition.
The proposed approach achieves superior performance over the state-of-the-art methods on most metrics.
arXiv Detail & Related papers (2024-08-01T15:35:44Z) - Contrastive Learning of Person-independent Representations for Facial
Action Unit Detection [70.60587475492065]
We formulate the self-supervised AU representation learning signals in two-fold.
We contrast learn the AU representation within a video clip and devise a cross-identity reconstruction mechanism to learn the person-independent representations.
Our method outperforms other contrastive learning methods and significantly closes the performance gap between the self-supervised and supervised AU detection approaches.
arXiv Detail & Related papers (2024-03-06T01:49:28Z) - Local Region Perception and Relationship Learning Combined with Feature
Fusion for Facial Action Unit Detection [12.677143408225167]
We introduce our submission to the CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW)
We propose a single-stage trained AU detection framework. Specifically, in order to effectively extract facial local region features related to AU detection, we use a local region perception module.
We also use a graph neural network-based relational learning module to capture the relationship between AUs.
arXiv Detail & Related papers (2023-03-15T11:59:24Z) - Attention Based Relation Network for Facial Action Units Recognition [8.522262699196412]
We propose a novel Attention Based Relation Network (ABRNet) for AU recognition.
ABRNet uses several relation learning layers to automatically capture different AU relations.
Our approach achieves state-of-the-art performance on the DISFA and DISFA+ datasets.
arXiv Detail & Related papers (2022-10-23T11:26:53Z) - Weakly Supervised Regional and Temporal Learning for Facial Action Unit
Recognition [36.350407471391065]
We propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance.
A single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles.
By incorporating semi-supervised learning, we propose an end-to-end trainable framework named weakly supervised regional and temporal learning.
arXiv Detail & Related papers (2022-04-01T12:02:01Z) - Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action
Unit Recognition [29.664359264758495]
We propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance.
To enhance the discrimination of regional features with AU relation embedding, we design a task of RoI inpainting to recover the randomly cropped AU patches.
A single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles.
Based on these two self-supervised auxiliary tasks, local features, mutual relation and motion cues of AUs are better captured in the backbone network.
arXiv Detail & Related papers (2021-07-30T02:39:45Z) - Meta Auxiliary Learning for Facial Action Unit Detection [84.22521265124806]
We consider learning AU detection and facial expression recognition in a multi-task manner.
The performance of the AU detection task cannot be always enhanced due to the negative transfer in the multi-task scenario.
We propose a Meta Learning method (MAL) that automatically selects highly related FE samples by learning adaptative weights for the training FE samples in a meta learning manner.
arXiv Detail & Related papers (2021-05-14T02:28:40Z) - AU-Expression Knowledge Constrained Representation Learning for Facial
Expression Recognition [79.8779790682205]
We propose an AU-Expression Knowledge Constrained Representation Learning (AUE-CRL) framework to learn the AU representations without AU annotations and adaptively use representations to facilitate facial expression recognition.
We conduct experiments on the challenging uncontrolled datasets to demonstrate the superiority of the proposed framework over current state-of-the-art methods.
arXiv Detail & Related papers (2020-12-29T03:42:04Z) - J$\hat{\text{A}}$A-Net: Joint Facial Action Unit Detection and Face
Alignment via Adaptive Attention [57.51255553918323]
We propose a novel end-to-end deep learning framework for joint AU detection and face alignment.
Our framework significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks.
arXiv Detail & Related papers (2020-03-18T12:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.