FaceXFormer: A Unified Transformer for Facial Analysis
- URL: http://arxiv.org/abs/2403.12960v3
- Date: Mon, 10 Mar 2025 17:08:19 GMT
- Title: FaceXFormer: A Unified Transformer for Facial Analysis
- Authors: Kartik Narayan, Vibashan VS, Rama Chellappa, Vishal M. Patel,
- Abstract summary: FaceXFormer is an end-to-end unified transformer model capable of performing ten facial analysis tasks.<n>Tasks include face parsing, landmark detection, head pose estimation, attribute prediction, age, gender, and race estimation.<n>We train FaceXFormer on ten diverse face perception datasets and evaluate it against both specialized and multi-task models.
- Score: 59.94066615853198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we introduce FaceXFormer, an end-to-end unified transformer model capable of performing ten facial analysis tasks within a single framework. These tasks include face parsing, landmark detection, head pose estimation, attribute prediction, age, gender, and race estimation, facial expression recognition, face recognition, and face visibility. Traditional face analysis approaches rely on task-specific architectures and pre-processing techniques, limiting scalability and integration. In contrast, FaceXFormer employs a transformer-based encoder-decoder architecture, where each task is represented as a learnable token, enabling seamless multi-task processing within a unified model. To enhance efficiency, we introduce FaceX, a lightweight decoder with a novel bi-directional cross-attention mechanism, which jointly processes face and task tokens to learn robust and generalized facial representations. We train FaceXFormer on ten diverse face perception datasets and evaluate it against both specialized and multi-task models across multiple benchmarks, demonstrating state-of-the-art or competitive performance. Additionally, we analyze the impact of various components of FaceXFormer on performance, assess real-world robustness in "in-the-wild" settings, and conduct a computational performance evaluation. To the best of our knowledge, FaceXFormer is the first model capable of handling ten facial analysis tasks while maintaining real-time performance at 33.21 FPS. Code: https://github.com/Kartik-3004/facexformer
Related papers
- Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning [5.178801281905521]
We propose Face-LLaVA, a large language model for face-centered, in-context learning, including facial expression and attribute recognition.
We first developed FaceInstruct-1M, a face-centered database for instruction tuning MLLMs for face processing.
We then developed a novel face-specific visual encoder powered by Face-Region Guided Cross-Attention.
arXiv Detail & Related papers (2025-04-09T18:26:07Z) - Task-adaptive Q-Face [75.15668556061772]
We propose a novel task-adaptive multi-task face analysis method named as Q-Face.
Q-Face simultaneously performs multiple face analysis tasks with a unified model.
Our method achieves state-of-the-art performance on face expression recognition, action unit detection, face attribute analysis, age estimation, and face pose estimation.
arXiv Detail & Related papers (2024-05-15T03:13:11Z) - Cross-Task Multi-Branch Vision Transformer for Facial Expression and Mask Wearing Classification [13.995453649985732]
We propose a unified multi-branch vision transformer for facial expression recognition and mask wearing classification tasks.
Our approach extracts shared features for both tasks using a dual-branch architecture.
Our proposed framework reduces the overall complexity compared with using separate networks for both tasks.
arXiv Detail & Related papers (2024-04-22T22:02:19Z) - Faceptor: A Generalist Model for Face Perception [52.8066001012464]
Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture.
Layer-Attention into Faceptor enables the model to adaptively select features from optimal layers to perform the desired tasks.
Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition.
arXiv Detail & Related papers (2024-03-14T15:42:31Z) - A Generalist FaceX via Learning Unified Facial Representation [77.74407008931486]
FaceX is a novel facial generalist model capable of handling diverse facial tasks simultaneously.
Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks.
arXiv Detail & Related papers (2023-12-31T17:41:48Z) - Toward High Quality Facial Representation Learning [58.873356953627614]
We propose a self-supervised pre-training framework, called Mask Contrastive Face (MCF)
We use feature map of a pre-trained visual backbone as a supervision item and use a partially pre-trained decoder for mask image modeling.
Our model achieves 0.932 NME_diag$ for AFLW-19 face alignment and 93.96 F1 score for LaPa face parsing.
arXiv Detail & Related papers (2023-09-07T09:11:49Z) - SwinFace: A Multi-task Transformer for Face Recognition, Expression
Recognition, Age Estimation and Attribute Estimation [60.94239810407917]
This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer.
To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis.
Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
arXiv Detail & Related papers (2023-08-22T15:38:39Z) - MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks.
Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z) - FaceX-Zoo: A PyTorch Toolbox for Face Recognition [62.038018324643325]
We introduce a novel open-source framework, named FaceX-Zoo, which is oriented to the research-development community of face recognition.
FaceX-Zoo provides a training module with various supervisory heads and backbones towards state-of-the-art face recognition.
A simple yet fully functional face SDK is provided for the validation and primary application of the trained models.
arXiv Detail & Related papers (2021-01-12T11:06:50Z) - MaskFace: multi-task face and landmark detector [0.0]
We present a highly accurate model for face and landmark detection.
The method, called MaskFace, extends previous face detection approaches by adding a keypoint prediction head.
We evaluate MaskFace's performance on a face detection task on the AFW, PASCAL face, FDDB, WIDER FACE datasets and a landmark localization task on the AFLW, 300W datasets.
arXiv Detail & Related papers (2020-05-19T13:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.