Faceptor: A Generalist Model for Face Perception
- URL: http://arxiv.org/abs/2403.09500v1
- Date: Thu, 14 Mar 2024 15:42:31 GMT
- Title: Faceptor: A Generalist Model for Face Perception
- Authors: Lixiong Qin, Mei Wang, Xuannan Liu, Yuhang Zhang, Wei Deng, Xiaoshuai Song, Weiran Xu, Weihong Deng,
- Abstract summary: Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture.
Layer-Attention into Faceptor enables the model to adaptively select features from optimal layers to perform the desired tasks.
Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition.
- Score: 52.8066001012464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the comprehensive research conducted on various face analysis tasks, there is a growing interest among researchers to develop a unified approach to face perception. Existing methods mainly discuss unified representation and training, which lack task extensibility and application efficiency. To tackle this issue, we focus on the unified model structure, exploring a face generalist model. As an intuitive design, Naive Faceptor enables tasks with the same output shape and granularity to share the structural design of the standardized output head, achieving improved task extensibility. Furthermore, Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture, allowing task-specific queries to represent new-coming semantics. This design enhances the unification of model structure while improving application efficiency in terms of storage overhead. Additionally, we introduce Layer-Attention into Faceptor, enabling the model to adaptively select features from optimal layers to perform the desired tasks. Through joint training on 13 face perception datasets, Faceptor achieves exceptional performance in facial landmark localization, face parsing, age estimation, expression recognition, binary attribute classification, and face recognition, achieving or surpassing specialized methods in most tasks. Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition. The code and models will be made publicly available at https://github.com/lxq1000/Faceptor.
Related papers
- Task-adaptive Q-Face [75.15668556061772]
We propose a novel task-adaptive multi-task face analysis method named as Q-Face.
Q-Face simultaneously performs multiple face analysis tasks with a unified model.
Our method achieves state-of-the-art performance on face expression recognition, action unit detection, face attribute analysis, age estimation, and face pose estimation.
arXiv Detail & Related papers (2024-05-15T03:13:11Z) - FaceXFormer: A Unified Transformer for Facial Analysis [59.94066615853198]
FaceXformer is an end-to-end unified transformer model for a range of facial analysis tasks.
Our model effectively handles images "in-the-wild," demonstrating its robustness and generalizability across eight different tasks.
arXiv Detail & Related papers (2024-03-19T17:58:04Z) - Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition [1.4374467687356276]
This paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification.
We suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model.
The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset.
arXiv Detail & Related papers (2024-03-19T16:21:47Z) - SwinFace: A Multi-task Transformer for Face Recognition, Expression
Recognition, Age Estimation and Attribute Estimation [60.94239810407917]
This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer.
To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis.
Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
arXiv Detail & Related papers (2023-08-22T15:38:39Z) - FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in
the Wild [50.8865921538953]
We propose a method to explicitly incorporate facial semantics into age estimation.
We design a face parsing-based network to learn semantic information at different scales.
We show that our method consistently outperforms all existing age estimation methods.
arXiv Detail & Related papers (2021-06-21T14:31:32Z) - FaceX-Zoo: A PyTorch Toolbox for Face Recognition [62.038018324643325]
We introduce a novel open-source framework, named FaceX-Zoo, which is oriented to the research-development community of face recognition.
FaceX-Zoo provides a training module with various supervisory heads and backbones towards state-of-the-art face recognition.
A simple yet fully functional face SDK is provided for the validation and primary application of the trained models.
arXiv Detail & Related papers (2021-01-12T11:06:50Z) - Boosting Deep Face Recognition via Disentangling Appearance and Geometry [33.196270681809395]
We propose a framework for disentangling the appearance and geometry representations in the face recognition task.
We generate geometrically identical faces by incorporating spatial transformations.
We show that the proposed approach enhances the performance of deep face recognition models.
arXiv Detail & Related papers (2020-01-13T23:19:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.