AU-Aware Vision Transformers for Biased Facial Expression Recognition
- URL: http://arxiv.org/abs/2211.06609v1
- Date: Sat, 12 Nov 2022 08:58:54 GMT
- Title: AU-Aware Vision Transformers for Biased Facial Expression Recognition
- Authors: Shuyi Mao, Xinpeng Li, Qingyang Wu, and Xiaojiang Peng
- Abstract summary: We experimentally show that the naive joint training of multiple FER datasets is harmful to the FER performance of individual datasets.
We propose a simple yet conceptually-new framework, AU-aware Vision Transformer (AU-ViT)
Our AU-ViT achieves state-of-the-art performance on three popular datasets, namely 91.10% on RAF-DB, 65.59% on AffectNet, and 90.15% on FERPlus.
- Score: 17.00557858587472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Studies have proven that domain bias and label bias exist in different Facial
Expression Recognition (FER) datasets, making it hard to improve the
performance of a specific dataset by adding other datasets. For the FER bias
issue, recent researches mainly focus on the cross-domain issue with advanced
domain adaption algorithms. This paper addresses another problem: how to boost
FER performance by leveraging cross-domain datasets. Unlike the coarse and
biased expression label, the facial Action Unit (AU) is fine-grained and
objective suggested by psychological studies. Motivated by this, we resort to
the AU information of different FER datasets for performance boosting and make
contributions as follows. First, we experimentally show that the naive joint
training of multiple FER datasets is harmful to the FER performance of
individual datasets. We further introduce expression-specific mean images and
AU cosine distances to measure FER dataset bias. This novel measurement shows
consistent conclusions with experimental degradation of joint training. Second,
we propose a simple yet conceptually-new framework, AU-aware Vision Transformer
(AU-ViT). It improves the performance of individual datasets by jointly
training auxiliary datasets with AU or pseudo-AU labels. We also find that the
AU-ViT is robust to real-world occlusions. Moreover, for the first time, we
prove that a carefully-initialized ViT achieves comparable performance to
advanced deep convolutional networks. Our AU-ViT achieves state-of-the-art
performance on three popular datasets, namely 91.10% on RAF-DB, 65.59% on
AffectNet, and 90.15% on FERPlus. The code and models will be released soon.
Related papers
- Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - AU-Supervised Convolutional Vision Transformers for Synthetic Facial
Expression Recognition [12.661683851729679]
The paper describes our proposed methodology for the six basic expression classification track of Affective Behavior Analysis in-the-wild (ABAW) Competition 2022.
Because of the ambiguous of the synthetic data and the objectivity of the facial Action Unit (AU), we resort to the AU information for performance boosting.
arXiv Detail & Related papers (2022-07-20T09:33:39Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - AU-Guided Unsupervised Domain Adaptive Facial Expression Recognition [21.126514122636966]
This paper proposes an AU-guided unsupervised Domain Adaptive FER framework to relieve the annotation bias between different FER datasets.
To achieve domain-invariant compact features, we utilize an AU-guided triplet training which randomly collects anchor-positive-negative triplets on both domains with AUs.
arXiv Detail & Related papers (2020-12-18T07:17:30Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.