ProS: Facial Omni-Representation Learning via Prototype-based
Self-Distillation
- URL: http://arxiv.org/abs/2311.01929v2
- Date: Tue, 7 Nov 2023 15:34:42 GMT
- Title: ProS: Facial Omni-Representation Learning via Prototype-based
Self-Distillation
- Authors: Xing Di, Yiyu Zheng, Xiaoming Liu, Yu Cheng
- Abstract summary: Prototype-based Self-Distillation (ProS) is a novel approach for unsupervised face representation learning.
ProS consists of two vision-transformers (teacher and student models) that are trained with different augmented images.
ProS achieves state-of-the-art performance on various tasks, both in full and few-shot settings.
- Score: 22.30414271893046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel approach, called Prototype-based
Self-Distillation (ProS), for unsupervised face representation learning. The
existing supervised methods heavily rely on a large amount of annotated
training facial data, which poses challenges in terms of data collection and
privacy concerns. To address these issues, we propose ProS, which leverages a
vast collection of unlabeled face images to learn a comprehensive facial
omni-representation. In particular, ProS consists of two vision-transformers
(teacher and student models) that are trained with different augmented images
(cropping, blurring, coloring, etc.). Besides, we build a face-aware retrieval
system along with augmentations to obtain the curated images comprising
predominantly facial areas. To enhance the discrimination of learned features,
we introduce a prototype-based matching loss that aligns the similarity
distributions between features (teacher or student) and a set of learnable
prototypes. After pre-training, the teacher vision transformer serves as a
backbone for downstream tasks, including attribute estimation, expression
recognition, and landmark alignment, achieved through simple fine-tuning with
additional layers. Extensive experiments demonstrate that our method achieves
state-of-the-art performance on various tasks, both in full and few-shot
settings. Furthermore, we investigate pre-training with synthetic face images,
and ProS exhibits promising performance in this scenario as well.
Related papers
- FACE-AUDITOR: Data Auditing in Facial Recognition Systems [24.082527732931677]
Few-shot-based facial recognition systems have gained increasing attention due to their scalability and ability to work with a few face images.
To prevent the face images from being misused, one straightforward approach is to modify the raw face images before sharing them.
We propose a complete toolkit FACE-AUDITOR that can query the few-shot-based facial recognition model and determine whether any of a user's face images is used in training the model.
arXiv Detail & Related papers (2023-04-05T23:03:54Z) - Understanding Self-Supervised Pretraining with Part-Aware Representation
Learning [88.45460880824376]
We study the capability that self-supervised representation pretraining methods learn part-aware representations.
Results show that the fully-supervised model outperforms self-supervised models for object-level recognition.
arXiv Detail & Related papers (2023-01-27T18:58:42Z) - Pose-disentangled Contrastive Learning for Self-supervised Facial
Representation [12.677909048435408]
We propose a novel Pose-disentangled Contrastive Learning (PCL) method for general self-supervised facial representation.
Our PCL first devises a pose-disentangled decoder (PDD), which disentangles the pose-related features from the face-aware features.
We then introduce a pose-related contrastive learning scheme that learns pose-related information based on data augmentation of the same image.
arXiv Detail & Related papers (2022-11-24T09:30:51Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers [57.1091606948826]
We propose a novel FER model, named Poker Face Vision Transformer or PF-ViT, to address these challenges.
PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face.
PF-ViT utilizes vanilla Vision Transformers, and its components are pre-trained as Masked Autoencoders on a large facial expression dataset.
arXiv Detail & Related papers (2022-07-22T13:39:06Z) - General Facial Representation Learning in a Visual-Linguistic Manner [45.92447707178299]
We introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.
We show that FaRL achieves better transfer performance compared with previous pre-trained models.
Our model surpasses the state-of-the-art methods on face analysis tasks including face parsing and face alignment.
arXiv Detail & Related papers (2021-12-06T15:22:05Z) - Self-supervised Contrastive Learning of Multi-view Facial Expressions [9.949781365631557]
Facial expression recognition (FER) has emerged as an important component of human-computer interaction systems.
We propose Contrastive Learning of Multi-view facial Expressions (CL-MEx) to exploit facial images captured simultaneously from different angles towards FER.
arXiv Detail & Related papers (2021-08-15T11:23:34Z) - Pro-UIGAN: Progressive Face Hallucination from Occluded Thumbnails [53.080403912727604]
We propose a multi-stage Progressive Upsampling and Inpainting Generative Adversarial Network, dubbed Pro-UIGAN.
It exploits facial geometry priors to replenish and upsample (8*) the occluded and tiny faces.
Pro-UIGAN achieves visually pleasing HR faces, reaching superior performance in downstream tasks.
arXiv Detail & Related papers (2021-08-02T02:29:24Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER.
The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.
In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.