FaceXFormer: A Unified Transformer for Facial Analysis
- URL: http://arxiv.org/abs/2403.12960v2
- Date: Thu, 19 Dec 2024 22:48:46 GMT
- Title: FaceXFormer: A Unified Transformer for Facial Analysis
- Authors: Kartik Narayan, Vibashan VS, Rama Chellappa, Vishal M. Patel,
- Abstract summary: FaceXFormer is an end-to-end unified transformer model capable of performing nine facial analysis tasks.
These tasks include face parsing, landmark detection, head pose estimation, attribute prediction, and estimation of age, gender, race, expression, and face visibility.
We propose a novel parameter-efficient decoder, FaceX, which jointly processes face and task tokens, thereby learning generalized and robust face representations.
- Score: 59.94066615853198
- License:
- Abstract: In this work, we introduce FaceXFormer, an end-to-end unified transformer model capable of performing nine facial analysis tasks including face parsing, landmark detection, head pose estimation, attribute prediction, and estimation of age, gender, race, expression, and face visibility within a single framework. Conventional methods in face analysis have often relied on task-specific designs and pre-processing techniques, which limit their scalability and integration into a unified architecture. Unlike these conventional methods, FaceXFormer leverages a transformer-based encoder-decoder architecture where each task is treated as a learnable token, enabling the seamless integration and simultaneous processing of multiple tasks within a single framework. Moreover, we propose a novel parameter-efficient decoder, FaceX, which jointly processes face and task tokens, thereby learning generalized and robust face representations across different tasks. We jointly trained FaceXFormer on nine face perception datasets and conducted experiments against specialized and multi-task models in both intra-dataset and cross-dataset evaluations across multiple benchmarks, showcasing state-of-the-art or competitive performance. Further, we performed a comprehensive analysis of different backbones for unified face task processing and evaluated our model "in-the-wild", demonstrating its robustness and generalizability. To the best of our knowledge, this is the first work to propose a single model capable of handling nine facial analysis tasks while maintaining real-time performance at 33.21 FPS.
Related papers
- Task-adaptive Q-Face [75.15668556061772]
We propose a novel task-adaptive multi-task face analysis method named as Q-Face.
Q-Face simultaneously performs multiple face analysis tasks with a unified model.
Our method achieves state-of-the-art performance on face expression recognition, action unit detection, face attribute analysis, age estimation, and face pose estimation.
arXiv Detail & Related papers (2024-05-15T03:13:11Z) - Faceptor: A Generalist Model for Face Perception [52.8066001012464]
Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture.
Layer-Attention into Faceptor enables the model to adaptively select features from optimal layers to perform the desired tasks.
Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition.
arXiv Detail & Related papers (2024-03-14T15:42:31Z) - A Generalist FaceX via Learning Unified Facial Representation [77.74407008931486]
FaceX is a novel facial generalist model capable of handling diverse facial tasks simultaneously.
Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks.
arXiv Detail & Related papers (2023-12-31T17:41:48Z) - SwinFace: A Multi-task Transformer for Face Recognition, Expression
Recognition, Age Estimation and Attribute Estimation [60.94239810407917]
This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer.
To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis.
Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
arXiv Detail & Related papers (2023-08-22T15:38:39Z) - Towards a Real-Time Facial Analysis System [13.649384403827359]
We present a system-level design of a real-time facial analysis system.
With a collection of deep neural networks for object detection, classification, and regression, the system recognizes age, gender, facial expression, and facial similarity for each person that appears in the camera view.
Results on common off-the-shelf architecture show that the system's accuracy is comparable to the state-of-the-art methods, and the recognition speed satisfies real-time requirements.
arXiv Detail & Related papers (2021-09-21T18:27:15Z) - FaceX-Zoo: A PyTorch Toolbox for Face Recognition [62.038018324643325]
We introduce a novel open-source framework, named FaceX-Zoo, which is oriented to the research-development community of face recognition.
FaceX-Zoo provides a training module with various supervisory heads and backbones towards state-of-the-art face recognition.
A simple yet fully functional face SDK is provided for the validation and primary application of the trained models.
arXiv Detail & Related papers (2021-01-12T11:06:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.