Multi-label Transformer for Action Unit Detection
- URL: http://arxiv.org/abs/2203.12531v1
- Date: Wed, 23 Mar 2022 16:46:09 GMT
- Title: Multi-label Transformer for Action Unit Detection
- Authors: Gauthier Tallec, Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
- Abstract summary: Action Unit (AU) Detection is the branch of affective computing that aims at recognizing unitary facial muscular movements.
We present our submission to the ABAW3 challenge.
In a nutshell, we applied a multi-label detection transformer that leverage multi-head attention to learn which part of the face image is the most relevant to predict each AU.
- Score: 7.09232719022402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Action Unit (AU) Detection is the branch of affective computing that aims at
recognizing unitary facial muscular movements. It is key to unlock unbiaised
computational face representations and has therefore aroused great interest in
the past few years. One of main obstacles toward building efficient deep
learning based AU detection system facial images database annotated by AU
experts. In that extent the ABAW challenge paves the way toward better AU
detection as it involves a ~2M frames AU annotated dataset. In this paper, we
present our submission to the ABAW3 challenge. In a nutshell, we applied a
multi-label detection transformer that leverage multi-head attention to learn
which part of the face image is the most relevant to predict each AU.
Related papers
- Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding Sample [53.23474626420103]
Facial action unit (AU) detection remains a challenging task, due to the subtlety, dynamics, and diversity of AUs.
We propose a novel AU detection framework called AC2D by adaptively constraining self-attention weight distribution.
Our method achieves competitive performance compared to state-of-the-art AU detection approaches on challenging benchmarks.
arXiv Detail & Related papers (2024-10-02T05:51:24Z) - Representation Learning and Identity Adversarial Training for Facial Behavior Understanding [3.350769246260559]
We show that subject identity provides a shortcut learning for the model and leads to sub-optimal solutions to AU predictions.
We propose Identity Adrial Training (IAT) and demonstrate that a strong IAT regularization is necessary to learn identity-invariant features.
Our proposed methods, Facial Masked Autoencoder (FMAE) and IAT, are simple, generic and effective.
arXiv Detail & Related papers (2024-07-15T21:13:28Z) - Contrastive Learning of Person-independent Representations for Facial
Action Unit Detection [70.60587475492065]
We formulate the self-supervised AU representation learning signals in two-fold.
We contrast learn the AU representation within a video clip and devise a cross-identity reconstruction mechanism to learn the person-independent representations.
Our method outperforms other contrastive learning methods and significantly closes the performance gap between the self-supervised and supervised AU detection approaches.
arXiv Detail & Related papers (2024-03-06T01:49:28Z) - Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part.
We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge.
Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z) - Cross-subject Action Unit Detection with Meta Learning and
Transformer-based Relation Modeling [7.395396464857193]
The paper proposes a meta-learning-based cross-subject AU detection model to eliminate the identity-caused differences.
A transformer-based relation learning module is introduced to learn the latent relations of multiple AUs.
Our results prove that on the two public datasets BP4D and DISFA, our method is superior to the state-of-the-art technology.
arXiv Detail & Related papers (2022-05-18T08:17:59Z) - An Attention-based Method for Action Unit Detection at the 3rd ABAW
Competition [6.229820412732652]
This paper describes our submission to the third Affective Behavior Analysis in-the-wild (ABAW) competition 2022.
We proposed a method for detecting facial action units in the video.
We achieved a macro F1 score of 0.48 on the ABAW challenge validation set compared to 0.39 from the baseline model.
arXiv Detail & Related papers (2022-03-23T14:07:39Z) - Attention Aware Wavelet-based Detection of Morphed Face Images [18.22557507385582]
We propose a wavelet-based morph detection methodology which adopts an end-to-end trainable soft attention mechanism.
We evaluate performance of the proposed framework using three datasets, VISAPP17, LMA, and MorGAN.
arXiv Detail & Related papers (2021-06-29T19:29:19Z) - Meta Auxiliary Learning for Facial Action Unit Detection [84.22521265124806]
We consider learning AU detection and facial expression recognition in a multi-task manner.
The performance of the AU detection task cannot be always enhanced due to the negative transfer in the multi-task scenario.
We propose a Meta Learning method (MAL) that automatically selects highly related FE samples by learning adaptative weights for the training FE samples in a meta learning manner.
arXiv Detail & Related papers (2021-05-14T02:28:40Z) - Unsupervised Pretraining for Object Detection by Patch Reidentification [72.75287435882798]
Unsupervised representation learning achieves promising performances in pre-training representations for object detectors.
This work proposes a simple yet effective representation learning method for object detection, named patch re-identification (Re-ID)
Our method significantly outperforms its counterparts on COCO in all settings, such as different training iterations and data percentages.
arXiv Detail & Related papers (2021-03-08T15:13:59Z) - Unsupervised Facial Action Unit Intensity Estimation via Differentiable
Optimization [45.07851622835555]
We propose an unsupervised framework GE-Net for facial AU intensity estimation from a single image.
Our framework performs differentiable optimization, which iteratively updates the facial parameters to match the input image.
Experimental results demonstrate that our method can achieve state-of-the-art results compared with existing methods.
arXiv Detail & Related papers (2020-04-13T12:56:28Z) - J$\hat{\text{A}}$A-Net: Joint Facial Action Unit Detection and Face
Alignment via Adaptive Attention [57.51255553918323]
We propose a novel end-to-end deep learning framework for joint AU detection and face alignment.
Our framework significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks.
arXiv Detail & Related papers (2020-03-18T12:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.