PaCo-FR: Patch-Pixel Aligned End-to-End Codebook Learning for Facial Representation Pre-training
- URL: http://arxiv.org/abs/2508.09691v1
- Date: Wed, 13 Aug 2025 10:37:41 GMT
- Title: PaCo-FR: Patch-Pixel Aligned End-to-End Codebook Learning for Facial Representation Pre-training
- Authors: Yin Xie, Zhichao Chen, Xiaoze Yu, Yongle Zhao, Xiang An, Kaicheng Yang, Zimin Ran, Jia Guo, Ziyong Feng, Jiankang Deng,
- Abstract summary: PaCo-FR is an unsupervised framework that combines masked image modeling with patch-pixel alignment.<n>PaCo-FR achieves state-of-the-art performance across several facial analysis tasks with just 2 million unlabeled images for pre-training.
- Score: 32.52750192639004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Facial representation pre-training is crucial for tasks like facial recognition, expression analysis, and virtual reality. However, existing methods face three key challenges: (1) failing to capture distinct facial features and fine-grained semantics, (2) ignoring the spatial structure inherent to facial anatomy, and (3) inefficiently utilizing limited labeled data. To overcome these, we introduce PaCo-FR, an unsupervised framework that combines masked image modeling with patch-pixel alignment. Our approach integrates three innovative components: (1) a structured masking strategy that preserves spatial coherence by aligning with semantically meaningful facial regions, (2) a novel patch-based codebook that enhances feature discrimination with multiple candidate tokens, and (3) spatial consistency constraints that preserve geometric relationships between facial components. PaCo-FR achieves state-of-the-art performance across several facial analysis tasks with just 2 million unlabeled images for pre-training. Our method demonstrates significant improvements, particularly in scenarios with varying poses, occlusions, and lighting conditions. We believe this work advances facial representation learning and offers a scalable, efficient solution that reduces reliance on expensive annotated datasets, driving more effective facial analysis systems.
Related papers
- Face-MakeUpV2: Facial Consistency Learning for Controllable Text-to-Image Generation [4.383815913901858]
Face-MakeUpV2 is a facial image generation model that aims to maintain consistency of face ID and physical characteristics with the reference image.<n>In experiments, Face-MakeUpV2 achieves best overall performance in terms of preserving face ID and maintaining physical consistency of the reference images.
arXiv Detail & Related papers (2025-10-17T09:31:08Z) - Discrete Facial Encoding: : A Framework for Data-driven Facial Display Discovery [6.096726247356906]
We introduce Discrete Facial, an unsupervised, data-driven alternative of compact and interpretable dictionary of facial expressions.<n>Our system consistently outperforms both FACS-based pipelines and strong image and video representation learning models.<n>Our representation covers a wider variety of facial displays, highlighting its potential as a scalable and effective alternative to FACS for psychological and affective computing applications.
arXiv Detail & Related papers (2025-10-02T04:44:45Z) - From Large Angles to Consistent Faces: Identity-Preserving Video Generation via Mixture of Facial Experts [69.44297222099175]
We introduce a Mixture of Facial Experts (MoFE) that captures distinct but mutually reinforcing aspects of facial attributes.<n>To mitigate dataset limitations, we have tailored a data processing pipeline centered on two key aspects: Face Constraints and Identity Consistency.<n>We have curated and refined a Large Face Angles (LFA) dataset from existing open-source human video datasets.
arXiv Detail & Related papers (2025-08-13T04:10:16Z) - SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches [116.1810651297801]
SketchYourSeg establishes freehand sketches as a powerful query modality for subjective image segmentation.<n>Our evaluations demonstrate superior performance over existing approaches across diverse benchmarks.
arXiv Detail & Related papers (2025-01-27T13:07:51Z) - Self-Supervised Facial Representation Learning with Facial Region
Awareness [13.06996608324306]
Self-supervised pre-training has been proven to be effective in learning transferable representations that benefit various visual tasks.
Recent efforts toward this goal are limited to treating each face image as a whole.
We propose a novel self-supervised facial representation learning framework to learn consistent global and local facial representations.
arXiv Detail & Related papers (2024-03-04T15:48:56Z) - Text-Guided Face Recognition using Multi-Granularity Cross-Modal
Contrastive Learning [0.0]
We introduce text-guided face recognition (TGFR) to analyze the impact of integrating facial attributes in the form of natural language descriptions.
TGFR demonstrates remarkable improvements, particularly on low-quality images, over existing face recognition models.
arXiv Detail & Related papers (2023-12-14T22:04:22Z) - Effective Adapter for Face Recognition in the Wild [72.75516495170199]
We tackle the challenge of face recognition in the wild, where images often suffer from low quality and real-world distortions.
Traditional approaches-either training models directly on degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective.
We propose an effective adapter for augmenting existing face recognition models trained on high-quality facial datasets.
arXiv Detail & Related papers (2023-12-04T08:55:46Z) - Improving Face Recognition from Caption Supervision with Multi-Granular
Contextual Feature Aggregation [0.0]
We introduce caption-guided face recognition (CGFR) as a new framework to improve the performance of commercial-off-the-shelf (COTS) face recognition systems.
We implement the proposed CGFR framework on two face recognition models (ArcFace and AdaFace) and evaluated its performance on the Multi-Modal CelebA-HQ dataset.
arXiv Detail & Related papers (2023-08-13T23:52:15Z) - Self-supervised Contrastive Learning of Multi-view Facial Expressions [9.949781365631557]
Facial expression recognition (FER) has emerged as an important component of human-computer interaction systems.
We propose Contrastive Learning of Multi-view facial Expressions (CL-MEx) to exploit facial images captured simultaneously from different angles towards FER.
arXiv Detail & Related papers (2021-08-15T11:23:34Z) - Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo
Collection [65.92058628082322]
Non-parametric face modeling aims to reconstruct 3D face only from images without shape assumptions.
This paper presents a novel Learning to Aggregate and Personalize framework for unsupervised robust 3D face modeling.
arXiv Detail & Related papers (2021-06-15T03:10:17Z) - Towards NIR-VIS Masked Face Recognition [47.00916333095693]
Near-infrared to visible (NIR-VIS) face recognition is the most common case in heterogeneous face recognition.
We propose a novel training method to maximize the mutual information shared by the face representation of two domains.
In addition, a 3D face reconstruction based approach is employed to synthesize masked face from the existing NIR image.
arXiv Detail & Related papers (2021-04-14T10:40:09Z) - Learning Oracle Attention for High-fidelity Face Completion [121.72704525675047]
We design a comprehensive framework for face completion based on the U-Net structure.
We propose a dual spatial attention module to efficiently learn the correlations between facial textures at multiple scales.
We take the location of the facial components as prior knowledge and impose a multi-discriminator on these regions.
arXiv Detail & Related papers (2020-03-31T01:37:10Z) - Dual-Attention GAN for Large-Pose Face Frontalization [59.689836951934694]
We present a novel Dual-Attention Generative Adversarial Network (DA-GAN) for photo-realistic face frontalization.
Specifically, a self-attention-based generator is introduced to integrate local features with their long-range dependencies.
A novel face-attention-based discriminator is applied to emphasize local features of face regions.
arXiv Detail & Related papers (2020-02-17T20:00:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.