SwinFace: A Multi-task Transformer for Face Recognition, Expression
Recognition, Age Estimation and Attribute Estimation
- URL: http://arxiv.org/abs/2308.11509v1
- Date: Tue, 22 Aug 2023 15:38:39 GMT
- Title: SwinFace: A Multi-task Transformer for Face Recognition, Expression
Recognition, Age Estimation and Attribute Estimation
- Authors: Lixiong Qin, Mei Wang, Chao Deng, Ke Wang, Xi Chen, Jiani Hu, Weihong
Deng
- Abstract summary: This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer.
To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis.
Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
- Score: 60.94239810407917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, vision transformers have been introduced into face
recognition and analysis and have achieved performance breakthroughs. However,
most previous methods generally train a single model or an ensemble of models
to perform the desired task, which ignores the synergy among different tasks
and fails to achieve improved prediction accuracy, increased data efficiency,
and reduced training time. This paper presents a multi-purpose algorithm for
simultaneous face recognition, facial expression recognition, age estimation,
and face attribute estimation (40 attributes including gender) based on a
single Swin Transformer. Our design, the SwinFace, consists of a single shared
backbone together with a subnet for each set of related tasks. To address the
conflicts among multiple tasks and meet the different demands of tasks, a
Multi-Level Channel Attention (MLCA) module is integrated into each
task-specific analysis subnet, which can adaptively select the features from
optimal levels and channels to perform the desired tasks. Extensive experiments
show that the proposed model has a better understanding of the face and
achieves excellent performance for all tasks. Especially, it achieves 90.97%
accuracy on RAF-DB and 0.22 $\epsilon$-error on CLAP2015, which are
state-of-the-art results on facial expression recognition and age estimation
respectively. The code and models will be made publicly available at
https://github.com/lxq1000/SwinFace.
Related papers
- Task-adaptive Q-Face [75.15668556061772]
We propose a novel task-adaptive multi-task face analysis method named as Q-Face.
Q-Face simultaneously performs multiple face analysis tasks with a unified model.
Our method achieves state-of-the-art performance on face expression recognition, action unit detection, face attribute analysis, age estimation, and face pose estimation.
arXiv Detail & Related papers (2024-05-15T03:13:11Z) - FaceXFormer: A Unified Transformer for Facial Analysis [59.94066615853198]
FaceXformer is an end-to-end unified transformer model for a range of facial analysis tasks.
Our model effectively handles images "in-the-wild," demonstrating its robustness and generalizability across eight different tasks.
arXiv Detail & Related papers (2024-03-19T17:58:04Z) - Faceptor: A Generalist Model for Face Perception [52.8066001012464]
Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture.
Layer-Attention into Faceptor enables the model to adaptively select features from optimal layers to perform the desired tasks.
Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition.
arXiv Detail & Related papers (2024-03-14T15:42:31Z) - MiVOLO: Multi-input Transformer for Age and Gender Estimation [0.0]
We present MiVOLO, a straightforward approach for age and gender estimation using the latest vision transformer.
Our method integrates both tasks into a unified dual input/output model.
We compare our model's age recognition performance with human-level accuracy and demonstrate that it significantly outperforms humans across a majority of age ranges.
arXiv Detail & Related papers (2023-07-10T14:58:10Z) - MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks.
Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z) - FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in
the Wild [50.8865921538953]
We propose a method to explicitly incorporate facial semantics into age estimation.
We design a face parsing-based network to learn semantic information at different scales.
We show that our method consistently outperforms all existing age estimation methods.
arXiv Detail & Related papers (2021-06-21T14:31:32Z) - Facial expression and attributes recognition based on multi-task
learning of lightweight neural networks [9.162936410696409]
We examine the multi-task training of lightweight convolutional neural networks for face identification and classification of facial attributes.
It is shown that it is still necessary to fine-tune these networks in order to predict facial expressions.
Several models are presented based on MobileNet, EfficientNet and RexNet architectures.
arXiv Detail & Related papers (2021-03-31T14:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.