Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action
Unit Recognition
- URL: http://arxiv.org/abs/2107.14399v1
- Date: Fri, 30 Jul 2021 02:39:45 GMT
- Title: Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action
Unit Recognition
- Authors: Jingwei Yan and Jingjing Wang and Qiang Li and Chunmao Wang and
Shiliang Pu
- Abstract summary: We propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance.
To enhance the discrimination of regional features with AU relation embedding, we design a task of RoI inpainting to recover the randomly cropped AU patches.
A single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles.
Based on these two self-supervised auxiliary tasks, local features, mutual relation and motion cues of AUs are better captured in the backbone network.
- Score: 29.664359264758495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic facial action unit (AU) recognition is a challenging task due to
the scarcity of manual annotations. To alleviate this problem, a large amount
of efforts has been dedicated to exploiting various methods which leverage
numerous unlabeled data. However, many aspects with regard to some unique
properties of AUs, such as the regional and relational characteristics, are not
sufficiently explored in previous works. Motivated by this, we take the AU
properties into consideration and propose two auxiliary AU related tasks to
bridge the gap between limited annotations and the model performance in a
self-supervised manner via the unlabeled data. Specifically, to enhance the
discrimination of regional features with AU relation embedding, we design a
task of RoI inpainting to recover the randomly cropped AU patches. Meanwhile, a
single image based optical flow estimation task is proposed to leverage the
dynamic change of facial muscles and encode the motion information into the
global feature representation. Based on these two self-supervised auxiliary
tasks, local features, mutual relation and motion cues of AUs are better
captured in the backbone network with the proposed regional and temporal based
auxiliary task learning (RTATL) framework. Extensive experiments on BP4D and
DISFA demonstrate the superiority of our method and new state-of-the-art
performances are achieved.
Related papers
- Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding Sample [53.23474626420103]
Facial action unit (AU) detection remains a challenging task, due to the subtlety, dynamics, and diversity of AUs.
We propose a novel AU detection framework called AC2D by adaptively constraining self-attention weight distribution.
Our method achieves competitive performance compared to state-of-the-art AU detection approaches on challenging benchmarks.
arXiv Detail & Related papers (2024-10-02T05:51:24Z) - Contrastive Learning of Person-independent Representations for Facial
Action Unit Detection [70.60587475492065]
We formulate the self-supervised AU representation learning signals in two-fold.
We contrast learn the AU representation within a video clip and devise a cross-identity reconstruction mechanism to learn the person-independent representations.
Our method outperforms other contrastive learning methods and significantly closes the performance gap between the self-supervised and supervised AU detection approaches.
arXiv Detail & Related papers (2024-03-06T01:49:28Z) - Self-supervised Facial Action Unit Detection with Region and Relation
Learning [5.182661263082065]
We propose a novel self-supervised framework for AU detection with the region and relation learning.
An improved Optimal Transport (OT) algorithm is introduced to exploit the correlation characteristics among AUs.
Swin Transformer is exploited to model the long-distance dependencies within each AU region during feature learning.
arXiv Detail & Related papers (2023-03-10T05:22:45Z) - Weakly Supervised Regional and Temporal Learning for Facial Action Unit
Recognition [36.350407471391065]
We propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance.
A single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles.
By incorporating semi-supervised learning, we propose an end-to-end trainable framework named weakly supervised regional and temporal learning.
arXiv Detail & Related papers (2022-04-01T12:02:01Z) - Adaptive Local-Global Relational Network for Facial Action Units
Recognition and Facial Paralysis Estimation [22.85506776477092]
We propose a novel Adaptive Local-Global Network (ALGRNet) for facial AU recognition and apply it to facial paralysis estimation.
ALGRNet consists of three novel structures, i.e., an adaptive region learning module which learns the adaptive muscle regions based on detected landmarks.
Experiments on the BP4 and DISFA AU datasets show that the proposed approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-03-03T16:14:49Z) - On Exploring Pose Estimation as an Auxiliary Learning Task for
Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework.
By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features.
Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z) - Multi-Level Adaptive Region of Interest and Graph Learning for Facial
Action Unit Recognition [30.129452080084224]
We propose a novel multi-level adaptive ROI and graph learning (MARGL) framework to tackle this problem.
In order to incorporate the intra-level AU relation and inter-level AU regional relevance simultaneously, a multi-level AU relation graph is constructed.
Experiments on BP4D and DISFA demonstrate the proposed MARGL significantly outperforms the previous state-of-the-art methods.
arXiv Detail & Related papers (2021-02-24T09:22:45Z) - AU-Guided Unsupervised Domain Adaptive Facial Expression Recognition [21.126514122636966]
This paper proposes an AU-guided unsupervised Domain Adaptive FER framework to relieve the annotation bias between different FER datasets.
To achieve domain-invariant compact features, we utilize an AU-guided triplet training which randomly collects anchor-positive-negative triplets on both domains with AUs.
arXiv Detail & Related papers (2020-12-18T07:17:30Z) - A Transfer Learning approach to Heatmap Regression for Action Unit
intensity estimation [50.261472059743845]
Action Units (AUs) are geometrically-based atomic facial muscle movements.
We propose a novel AU modelling problem that consists of jointly estimating their localisation and intensity.
A Heatmap models whether an AU occurs or not at a given spatial location.
arXiv Detail & Related papers (2020-04-14T16:51:13Z) - J$\hat{\text{A}}$A-Net: Joint Facial Action Unit Detection and Face
Alignment via Adaptive Attention [57.51255553918323]
We propose a novel end-to-end deep learning framework for joint AU detection and face alignment.
Our framework significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks.
arXiv Detail & Related papers (2020-03-18T12:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.