Related papers: Self-supervised Facial Action Unit Detection with Region and Relation Learning

Self-supervised Facial Action Unit Detection with Region and Relation Learning

URL: http://arxiv.org/abs/2303.05708v1
Date: Fri, 10 Mar 2023 05:22:45 GMT
Title: Self-supervised Facial Action Unit Detection with Region and Relation Learning
Authors: Juan Song and Zhilei Liu
Abstract summary: We propose a novel self-supervised framework for AU detection with the region and relation learning. An improved Optimal Transport (OT) algorithm is introduced to exploit the correlation characteristics among AUs. Swin Transformer is exploited to model the long-distance dependencies within each AU region during feature learning.
Score: 5.182661263082065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Facial action unit (AU) detection is a challenging task due to the scarcity of manual annotations. Recent works on AU detection with self-supervised learning have emerged to address this problem, aiming to learn meaningful AU representations from numerous unlabeled data. However, most existing AU detection works with self-supervised learning utilize global facial features only, while AU-related properties such as locality and relevance are not fully explored. In this paper, we propose a novel self-supervised framework for AU detection with the region and relation learning. In particular, AU related attention map is utilized to guide the model to focus more on AU-specific regions to enhance the integrity of AU local features. Meanwhile, an improved Optimal Transport (OT) algorithm is introduced to exploit the correlation characteristics among AUs. In addition, Swin Transformer is exploited to model the long-distance dependencies within each AU region during feature learning. The evaluation results on BP4D and DISFA demonstrate that our proposed method is comparable or even superior to the state-of-the-art self-supervised learning methods and supervised AU detection methods.

Related papers

Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding Sample [53.23474626420103]
Facial action unit (AU) detection remains a challenging task, due to the subtlety, dynamics, and diversity of AUs. We propose a novel AU detection framework called AC2D by adaptively constraining self-attention weight distribution. Our method achieves competitive performance compared to state-of-the-art AU detection approaches on challenging benchmarks.
arXiv Detail & Related papers (2024-10-02T05:51:24Z)
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning [48.70249675019288]
We propose an end-to-end Vision-Language joint learning network for explainable facial action units (AUs) recognition. The proposed approach achieves superior performance over the state-of-the-art methods on most metrics.
arXiv Detail & Related papers (2024-08-01T15:35:44Z)
Contrastive Learning of Person-independent Representations for Facial Action Unit Detection [70.60587475492065]
We formulate the self-supervised AU representation learning signals in two-fold. We contrast learn the AU representation within a video clip and devise a cross-identity reconstruction mechanism to learn the person-independent representations. Our method outperforms other contrastive learning methods and significantly closes the performance gap between the self-supervised and supervised AU detection approaches.
arXiv Detail & Related papers (2024-03-06T01:49:28Z)
Local Region Perception and Relationship Learning Combined with Feature Fusion for Facial Action Unit Detection [12.677143408225167]
We introduce our submission to the CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW) We propose a single-stage trained AU detection framework. Specifically, in order to effectively extract facial local region features related to AU detection, we use a local region perception module. We also use a graph neural network-based relational learning module to capture the relationship between AUs.
arXiv Detail & Related papers (2023-03-15T11:59:24Z)
Attention Based Relation Network for Facial Action Units Recognition [8.522262699196412]
We propose a novel Attention Based Relation Network (ABRNet) for AU recognition. ABRNet uses several relation learning layers to automatically capture different AU relations. Our approach achieves state-of-the-art performance on the DISFA and DISFA+ datasets.
arXiv Detail & Related papers (2022-10-23T11:26:53Z)
Weakly Supervised Regional and Temporal Learning for Facial Action Unit Recognition [36.350407471391065]
We propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance. A single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles. By incorporating semi-supervised learning, we propose an end-to-end trainable framework named weakly supervised regional and temporal learning.
arXiv Detail & Related papers (2022-04-01T12:02:01Z)
Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition [29.664359264758495]
We propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance. To enhance the discrimination of regional features with AU relation embedding, we design a task of RoI inpainting to recover the randomly cropped AU patches. A single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles. Based on these two self-supervised auxiliary tasks, local features, mutual relation and motion cues of AUs are better captured in the backbone network.
arXiv Detail & Related papers (2021-07-30T02:39:45Z)
Meta Auxiliary Learning for Facial Action Unit Detection [84.22521265124806]
We consider learning AU detection and facial expression recognition in a multi-task manner. The performance of the AU detection task cannot be always enhanced due to the negative transfer in the multi-task scenario. We propose a Meta Learning method (MAL) that automatically selects highly related FE samples by learning adaptative weights for the training FE samples in a meta learning manner.
arXiv Detail & Related papers (2021-05-14T02:28:40Z)
AU-Expression Knowledge Constrained Representation Learning for Facial Expression Recognition [79.8779790682205]
We propose an AU-Expression Knowledge Constrained Representation Learning (AUE-CRL) framework to learn the AU representations without AU annotations and adaptively use representations to facilitate facial expression recognition. We conduct experiments on the challenging uncontrolled datasets to demonstrate the superiority of the proposed framework over current state-of-the-art methods.
arXiv Detail & Related papers (2020-12-29T03:42:04Z)
J$\hat{\text{A}}$A-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention [57.51255553918323]
We propose a novel end-to-end deep learning framework for joint AU detection and face alignment. Our framework significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks.
arXiv Detail & Related papers (2020-03-18T12:50:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.