Cross-subject Action Unit Detection with Meta Learning and
Transformer-based Relation Modeling
- URL: http://arxiv.org/abs/2205.08787v1
- Date: Wed, 18 May 2022 08:17:59 GMT
- Title: Cross-subject Action Unit Detection with Meta Learning and
Transformer-based Relation Modeling
- Authors: Jiyuan Cao, Zhilei Liu, Yong Zhang
- Abstract summary: The paper proposes a meta-learning-based cross-subject AU detection model to eliminate the identity-caused differences.
A transformer-based relation learning module is introduced to learn the latent relations of multiple AUs.
Our results prove that on the two public datasets BP4D and DISFA, our method is superior to the state-of-the-art technology.
- Score: 7.395396464857193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial Action Unit (AU) detection is a crucial task for emotion analysis from
facial movements. The apparent differences of different subjects sometimes
mislead changes brought by AUs, resulting in inaccurate results. However, most
of the existing AU detection methods based on deep learning didn't consider the
identity information of different subjects. The paper proposes a
meta-learning-based cross-subject AU detection model to eliminate the
identity-caused differences. Besides, a transformer-based relation learning
module is introduced to learn the latent relations of multiple AUs. To be
specific, our proposed work is composed of two sub-tasks. The first sub-task is
meta-learning-based AU local region representation learning, called MARL, which
learns discriminative representation of local AU regions that incorporates the
shared information of multiple subjects and eliminates identity-caused
differences. The second sub-task uses the local region representation of AU of
the first sub-task as input, then adds relationship learning based on the
transformer encoder architecture to capture AU relationships. The entire
training process is cascaded. Ablation study and visualization show that our
MARL can eliminate identity-caused differences, thus obtaining a robust and
generalized AU discriminative embedding representation. Our results prove that
on the two public datasets BP4D and DISFA, our method is superior to the
state-of-the-art technology, and the F1 score is improved by 1.3% and 1.4%,
respectively.
Related papers
- Representation Learning and Identity Adversarial Training for Facial Behavior Understanding [3.350769246260559]
We show that subject identity provides a shortcut learning for the model and leads to sub-optimal solutions to AU predictions.
We propose Identity Adrial Training (IAT) and demonstrate that a strong IAT regularization is necessary to learn identity-invariant features.
Our proposed methods, Facial Masked Autoencoder (FMAE) and IAT, are simple, generic and effective.
arXiv Detail & Related papers (2024-07-15T21:13:28Z) - Contrastive Learning of Person-independent Representations for Facial
Action Unit Detection [70.60587475492065]
We formulate the self-supervised AU representation learning signals in two-fold.
We contrast learn the AU representation within a video clip and devise a cross-identity reconstruction mechanism to learn the person-independent representations.
Our method outperforms other contrastive learning methods and significantly closes the performance gap between the self-supervised and supervised AU detection approaches.
arXiv Detail & Related papers (2024-03-06T01:49:28Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Siamese DETR [87.45960774877798]
We present Siamese DETR, a self-supervised pretraining approach for the Transformer architecture in DETR.
We consider learning view-invariant and detection-oriented representations simultaneously through two complementary tasks.
The proposed Siamese DETR achieves state-of-the-art transfer performance on COCO and PASCAL VOC detection.
arXiv Detail & Related papers (2023-03-31T15:29:25Z) - FAN-Trans: Online Knowledge Distillation for Facial Action Unit
Detection [45.688712067285536]
Leveraging the online knowledge distillation framework, we propose the FANTrans" method for AU detection.
Our model consists of a hybrid network of convolution and transformer blocks to learn per-AU features and to model AU co-occurrences.
arXiv Detail & Related papers (2022-11-11T11:35:33Z) - Learning Multi-dimensional Edge Feature-based AU Relation Graph for
Facial Action Unit Recognition [27.34564955127377]
The activations of Facial Action Units (AUs) mutually influence one another.
Existing approaches fail to specifically and explicitly represent such cues for each pair of AUs in each facial display.
This paper proposes an AU relationship modelling approach that deep learns a unique graph to explicitly describe the relationship between each pair of AUs.
arXiv Detail & Related papers (2022-05-02T03:38:00Z) - Weakly Supervised Regional and Temporal Learning for Facial Action Unit
Recognition [36.350407471391065]
We propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance.
A single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles.
By incorporating semi-supervised learning, we propose an end-to-end trainable framework named weakly supervised regional and temporal learning.
arXiv Detail & Related papers (2022-04-01T12:02:01Z) - Meta Auxiliary Learning for Facial Action Unit Detection [84.22521265124806]
We consider learning AU detection and facial expression recognition in a multi-task manner.
The performance of the AU detection task cannot be always enhanced due to the negative transfer in the multi-task scenario.
We propose a Meta Learning method (MAL) that automatically selects highly related FE samples by learning adaptative weights for the training FE samples in a meta learning manner.
arXiv Detail & Related papers (2021-05-14T02:28:40Z) - Unsupervised Pretraining for Object Detection by Patch Reidentification [72.75287435882798]
Unsupervised representation learning achieves promising performances in pre-training representations for object detectors.
This work proposes a simple yet effective representation learning method for object detection, named patch re-identification (Re-ID)
Our method significantly outperforms its counterparts on COCO in all settings, such as different training iterations and data percentages.
arXiv Detail & Related papers (2021-03-08T15:13:59Z) - Deep Multi-task Multi-label CNN for Effective Facial Attribute
Classification [53.58763562421771]
We propose a novel deep multi-task multi-label CNN, termed DMM-CNN, for effective Facial Attribute Classification (FAC)
Specifically, DMM-CNN jointly optimize two closely-related tasks (i.e., facial landmark detection and FAC) to improve the performance of FAC by taking advantage of multi-task learning.
Two different network architectures are respectively designed to extract features for two groups of attributes, and a novel dynamic weighting scheme is proposed to automatically assign the loss weight to each facial attribute during training.
arXiv Detail & Related papers (2020-02-10T12:34:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.