Proto-Former: Unified Facial Landmark Detection by Prototype Transformer
- URL: http://arxiv.org/abs/2510.15338v1
- Date: Fri, 17 Oct 2025 06:00:25 GMT
- Title: Proto-Former: Unified Facial Landmark Detection by Prototype Transformer
- Authors: Shengkai Hu, Haozhe Qi, Jun Wan, Jiaxing Huang, Lefei Zhang, Hang Sun, Dacheng Tao,
- Abstract summary: Proto-Former is a unified, adaptive, end-to-end facial landmark detection framework.<n>It enables joint training across multiple datasets within a unified architecture.<n>Proto-Former achieves superior performance compared to existing state-of-the-art methods.
- Score: 77.47431726595111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in deep learning have significantly improved facial landmark detection. However, existing facial landmark detection datasets often define different numbers of landmarks, and most mainstream methods can only be trained on a single dataset. This limits the model generalization to different datasets and hinders the development of a unified model. To address this issue, we propose Proto-Former, a unified, adaptive, end-to-end facial landmark detection framework that explicitly enhances dataset-specific facial structural representations (i.e., prototype). Proto-Former overcomes the limitations of single-dataset training by enabling joint training across multiple datasets within a unified architecture. Specifically, Proto-Former comprises two key components: an Adaptive Prototype-Aware Encoder (APAE) that performs adaptive feature extraction and learns prototype representations, and a Progressive Prototype-Aware Decoder (PPAD) that refines these prototypes to generate prompts that guide the model's attention to key facial regions. Furthermore, we introduce a novel Prototype-Aware (PA) loss, which achieves optimal path finding by constraining the selection weights of prototype experts. This loss function effectively resolves the problem of prototype expert addressing instability during multi-dataset training, alleviates gradient conflicts, and enables the extraction of more accurate facial structure features. Extensive experiments on widely used benchmark datasets demonstrate that our Proto-Former achieves superior performance compared to existing state-of-the-art methods. The code is publicly available at: https://github.com/Husk021118/Proto-Former.
Related papers
- Divide, Conquer and Unite: Hierarchical Style-Recalibrated Prototype Alignment for Federated Medical Image Segmentation [66.82598255715696]
Federated learning enables multiple medical institutions to train a global model without sharing data.<n>Current approaches primarily focus on final-layer features, overlooking critical multi-level cues.<n>We propose FedBCS to bridge feature representation gaps via domain-invariant contextual prototypes alignment.
arXiv Detail & Related papers (2025-11-14T04:15:34Z) - Efficient Prototype Consistency Learning in Medical Image Segmentation via Joint Uncertainty and Data Augmentation [32.47805202531351]
Prototype learning has emerged in semi-supervised medical image segmentation.<n>We propose an efficient prototype consistency learning via joint uncertainty quantification and data augmentation.<n>Our framework is superior to previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-05-22T06:25:32Z) - Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [81.93945602120453]
We introduce an approach that is both general and parameter-efficient for face forgery detection.<n>We design a forgery-style mixture formulation that augments the diversity of forgery source domains.<n>We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - Learning with Mixture of Prototypes for Out-of-Distribution Detection [25.67011646236146]
Out-of-distribution (OOD) detection aims to detect testing samples far away from the in-distribution (ID) training data.
We propose PrototypicAl Learning with a Mixture of prototypes (PALM) which models each class with multiple prototypes to capture the sample diversities.
Our method achieves state-of-the-art average AUROC performance of 93.82 on the challenging CIFAR-100 benchmark.
arXiv Detail & Related papers (2024-02-05T00:52:50Z) - ProtoDiff: Learning to Learn Prototypical Networks by Task-Guided
Diffusion [44.805452233966534]
Prototype-based meta-learning has emerged as a powerful technique for addressing few-shot learning challenges.
We introduce ProtoDiff, a framework that gradually generates task-specific prototypes from random noise.
We conduct thorough ablation studies to demonstrate its ability to accurately capture the underlying prototype distribution.
arXiv Detail & Related papers (2023-06-26T15:26:24Z) - Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part.
We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge.
Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z) - Rethinking Semantic Segmentation: A Prototype View [126.59244185849838]
We present a nonparametric semantic segmentation model based on non-learnable prototypes.
Our framework yields compelling results over several datasets.
We expect this work will provoke a rethink of the current de facto semantic segmentation model design.
arXiv Detail & Related papers (2022-03-28T21:15:32Z) - Dual Prototypical Contrastive Learning for Few-shot Semantic
Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task.
The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space.
We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z) - Prototype Completion for Few-Shot Learning [13.63424509914303]
Few-shot learning aims to recognize novel classes with few examples.
Pre-training based methods effectively tackle the problem by pre-training a feature extractor and then fine-tuning it through the nearest centroid based meta-learning.
We propose a novel prototype completion based meta-learning framework.
arXiv Detail & Related papers (2021-08-11T03:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.