Self-supervised Facial Action Unit Detection with Region and Relation
Learning
- URL: http://arxiv.org/abs/2303.05708v1
- Date: Fri, 10 Mar 2023 05:22:45 GMT
- Title: Self-supervised Facial Action Unit Detection with Region and Relation
Learning
- Authors: Juan Song and Zhilei Liu
- Abstract summary: We propose a novel self-supervised framework for AU detection with the region and relation learning.
An improved Optimal Transport (OT) algorithm is introduced to exploit the correlation characteristics among AUs.
Swin Transformer is exploited to model the long-distance dependencies within each AU region during feature learning.
- Score: 5.182661263082065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial action unit (AU) detection is a challenging task due to the scarcity
of manual annotations. Recent works on AU detection with self-supervised
learning have emerged to address this problem, aiming to learn meaningful AU
representations from numerous unlabeled data. However, most existing AU
detection works with self-supervised learning utilize global facial features
only, while AU-related properties such as locality and relevance are not fully
explored. In this paper, we propose a novel self-supervised framework for AU
detection with the region and relation learning. In particular, AU related
attention map is utilized to guide the model to focus more on AU-specific
regions to enhance the integrity of AU local features. Meanwhile, an improved
Optimal Transport (OT) algorithm is introduced to exploit the correlation
characteristics among AUs. In addition, Swin Transformer is exploited to model
the long-distance dependencies within each AU region during feature learning.
The evaluation results on BP4D and DISFA demonstrate that our proposed method
is comparable or even superior to the state-of-the-art self-supervised learning
methods and supervised AU detection methods.
Related papers
- Contrastive Learning of Person-independent Representations for Facial
Action Unit Detection [70.60587475492065]
We formulate the self-supervised AU representation learning signals in two-fold.
We contrast learn the AU representation within a video clip and devise a cross-identity reconstruction mechanism to learn the person-independent representations.
Our method outperforms other contrastive learning methods and significantly closes the performance gap between the self-supervised and supervised AU detection approaches.
arXiv Detail & Related papers (2024-03-06T01:49:28Z) - Local Region Perception and Relationship Learning Combined with Feature
Fusion for Facial Action Unit Detection [12.677143408225167]
We introduce our submission to the CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW)
We propose a single-stage trained AU detection framework. Specifically, in order to effectively extract facial local region features related to AU detection, we use a local region perception module.
We also use a graph neural network-based relational learning module to capture the relationship between AUs.
arXiv Detail & Related papers (2023-03-15T11:59:24Z) - FAN-Trans: Online Knowledge Distillation for Facial Action Unit
Detection [45.688712067285536]
Leveraging the online knowledge distillation framework, we propose the FANTrans" method for AU detection.
Our model consists of a hybrid network of convolution and transformer blocks to learn per-AU features and to model AU co-occurrences.
arXiv Detail & Related papers (2022-11-11T11:35:33Z) - Attention Based Relation Network for Facial Action Units Recognition [8.522262699196412]
We propose a novel Attention Based Relation Network (ABRNet) for AU recognition.
ABRNet uses several relation learning layers to automatically capture different AU relations.
Our approach achieves state-of-the-art performance on the DISFA and DISFA+ datasets.
arXiv Detail & Related papers (2022-10-23T11:26:53Z) - Weakly Supervised Regional and Temporal Learning for Facial Action Unit
Recognition [36.350407471391065]
We propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance.
A single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles.
By incorporating semi-supervised learning, we propose an end-to-end trainable framework named weakly supervised regional and temporal learning.
arXiv Detail & Related papers (2022-04-01T12:02:01Z) - Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action
Unit Recognition [29.664359264758495]
We propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance.
To enhance the discrimination of regional features with AU relation embedding, we design a task of RoI inpainting to recover the randomly cropped AU patches.
A single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles.
Based on these two self-supervised auxiliary tasks, local features, mutual relation and motion cues of AUs are better captured in the backbone network.
arXiv Detail & Related papers (2021-07-30T02:39:45Z) - Meta Auxiliary Learning for Facial Action Unit Detection [84.22521265124806]
We consider learning AU detection and facial expression recognition in a multi-task manner.
The performance of the AU detection task cannot be always enhanced due to the negative transfer in the multi-task scenario.
We propose a Meta Learning method (MAL) that automatically selects highly related FE samples by learning adaptative weights for the training FE samples in a meta learning manner.
arXiv Detail & Related papers (2021-05-14T02:28:40Z) - Goal-Oriented Gaze Estimation for Zero-Shot Learning [62.52340838817908]
We introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization.
We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description.
This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks.
arXiv Detail & Related papers (2021-03-05T02:14:57Z) - AU-Expression Knowledge Constrained Representation Learning for Facial
Expression Recognition [79.8779790682205]
We propose an AU-Expression Knowledge Constrained Representation Learning (AUE-CRL) framework to learn the AU representations without AU annotations and adaptively use representations to facilitate facial expression recognition.
We conduct experiments on the challenging uncontrolled datasets to demonstrate the superiority of the proposed framework over current state-of-the-art methods.
arXiv Detail & Related papers (2020-12-29T03:42:04Z) - J$\hat{\text{A}}$A-Net: Joint Facial Action Unit Detection and Face
Alignment via Adaptive Attention [57.51255553918323]
We propose a novel end-to-end deep learning framework for joint AU detection and face alignment.
Our framework significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks.
arXiv Detail & Related papers (2020-03-18T12:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.