AU-Supervised Convolutional Vision Transformers for Synthetic Facial
Expression Recognition
- URL: http://arxiv.org/abs/2207.09777v2
- Date: Fri, 22 Jul 2022 04:24:52 GMT
- Title: AU-Supervised Convolutional Vision Transformers for Synthetic Facial
Expression Recognition
- Authors: Shuyi Mao, Xinpeng Li, Junyao Chen, Xiaojiang Peng
- Abstract summary: The paper describes our proposed methodology for the six basic expression classification track of Affective Behavior Analysis in-the-wild (ABAW) Competition 2022.
Because of the ambiguous of the synthetic data and the objectivity of the facial Action Unit (AU), we resort to the AU information for performance boosting.
- Score: 12.661683851729679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paper describes our proposed methodology for the six basic expression
classification track of Affective Behavior Analysis in-the-wild (ABAW)
Competition 2022. In Learing from Synthetic Data(LSD) task, facial expression
recognition (FER) methods aim to learn the representation of expression from
the artificially generated data and generalise to real data. Because of the
ambiguous of the synthetic data and the objectivity of the facial Action Unit
(AU), we resort to the AU information for performance boosting, and make
contributions as follows. First, to adapt the model to synthetic scenarios, we
use the knowledge from pre-trained large-scale face recognition data. Second,
we propose a conceptually-new framework, termed as AU-Supervised Convolutional
Vision Transformers (AU-CVT), which clearly improves the performance of FER by
jointly training auxiliary datasets with AU or pseudo AU labels. Our AU-CVT
achieved F1 score as $0.6863$, accuracy as $0.7433$ on the validation set. The
source code of our work is publicly available online:
https://github.com/msy1412/ABAW4
Related papers
- Towards Unified Facial Action Unit Recognition Framework by Large Language Models [10.752099675130276]
We propose AU-LLaVA, the first unified AU recognition framework based on the Large Language Model (LLM)
AU-LLaVA consists of a visual encoder, a linear projector layer, and a pre-trained LLM.
On the BP4D and DISFA datasets, AU-LLaVA delivers the most accurate recognition results for nearly half of the AUs.
arXiv Detail & Related papers (2024-09-13T00:26:09Z) - UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos [83.48170683672427]
UniLearn is a unified learning paradigm that integrates static facial expression recognition data to enhance DFER task.
UniLearn consistently state-of-the-art performance on FERV39K, MAFW, and DFEW benchmarks, with weighted average recall (WAR) of 53.65%, 58.44%, and 76.68%, respectively.
arXiv Detail & Related papers (2024-09-10T01:57:57Z) - Representation Learning and Identity Adversarial Training for Facial Behavior Understanding [3.350769246260559]
We show that subject identity provides a shortcut learning for the model and leads to sub-optimal solutions to AU predictions.
We propose Identity Adrial Training (IAT) and demonstrate that a strong IAT regularization is necessary to learn identity-invariant features.
Our proposed methods, Facial Masked Autoencoder (FMAE) and IAT, are simple, generic and effective.
arXiv Detail & Related papers (2024-07-15T21:13:28Z) - Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data [104.45155847778584]
This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn)
FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations.
arXiv Detail & Related papers (2024-04-16T08:15:10Z) - SDFR: Synthetic Data for Face Recognition Competition [51.9134406629509]
Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns.
Recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets.
This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024)
The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones.
arXiv Detail & Related papers (2024-04-06T10:30:31Z) - Boosting Continuous Emotion Recognition with Self-Pretraining using Masked Autoencoders, Temporal Convolutional Networks, and Transformers [3.951847822557829]
We tackle the Valence-Arousal (VA) Estimation Challenge, Expression (Expr) Classification Challenge, and Action Unit (AU) Detection Challenge.
Our study advocates a novel approach aimed at refining continuous emotion recognition.
We achieve this by pre-training with Masked Autoencoders (MAE) on facial datasets, followed by fine-tuning on the aff-wild2 dataset annotated with expression (Expr) labels.
arXiv Detail & Related papers (2024-03-18T03:28:01Z) - Multi-modal Facial Affective Analysis based on Masked Autoencoder [7.17338843593134]
We introduce our submission to the CVPR 2023: ABAW5 competition: Affective Behavior Analysis in-the-wild.
Our approach involves several key components. First, we utilize the visual information from a Masked Autoencoder(MAE) model that has been pre-trained on a large-scale face image dataset in a self-supervised manner.
Our approach achieves impressive results in the ABAW5 competition, with an average F1 score of 55.49% and 41.21% in the AU and EXPR tracks, respectively.
arXiv Detail & Related papers (2023-03-20T03:58:03Z) - AU-Aware Vision Transformers for Biased Facial Expression Recognition [17.00557858587472]
We experimentally show that the naive joint training of multiple FER datasets is harmful to the FER performance of individual datasets.
We propose a simple yet conceptually-new framework, AU-aware Vision Transformer (AU-ViT)
Our AU-ViT achieves state-of-the-art performance on three popular datasets, namely 91.10% on RAF-DB, 65.59% on AffectNet, and 90.15% on FERPlus.
arXiv Detail & Related papers (2022-11-12T08:58:54Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.