Toward High Quality Facial Representation Learning
- URL: http://arxiv.org/abs/2309.03575v1
- Date: Thu, 7 Sep 2023 09:11:49 GMT
- Title: Toward High Quality Facial Representation Learning
- Authors: Yue Wang, Jinlong Peng, Jiangning Zhang, Ran Yi, Liang Liu, Yabiao
Wang, Chengjie Wang
- Abstract summary: We propose a self-supervised pre-training framework, called Mask Contrastive Face (MCF)
We use feature map of a pre-trained visual backbone as a supervision item and use a partially pre-trained decoder for mask image modeling.
Our model achieves 0.932 NME_diag$ for AFLW-19 face alignment and 93.96 F1 score for LaPa face parsing.
- Score: 58.873356953627614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face analysis tasks have a wide range of applications, but the universal
facial representation has only been explored in a few works. In this paper, we
explore high-performance pre-training methods to boost the face analysis tasks
such as face alignment and face parsing. We propose a self-supervised
pre-training framework, called \textbf{\it Mask Contrastive Face (MCF)}, with
mask image modeling and a contrastive strategy specially adjusted for face
domain tasks. To improve the facial representation quality, we use feature map
of a pre-trained visual backbone as a supervision item and use a partially
pre-trained decoder for mask image modeling. To handle the face identity during
the pre-training stage, we further use random masks to build contrastive
learning pairs. We conduct the pre-training on the LAION-FACE-cropped dataset,
a variants of LAION-FACE 20M, which contains more than 20 million face images
from Internet websites. For efficiency pre-training, we explore our framework
pre-training performance on a small part of LAION-FACE-cropped and verify the
superiority with different pre-training settings. Our model pre-trained with
the full pre-training dataset outperforms the state-of-the-art methods on
multiple downstream tasks. Our model achieves 0.932 NME$_{diag}$ for AFLW-19
face alignment and 93.96 F1 score for LaPa face parsing. Code is available at
https://github.com/nomewang/MCF.
Related papers
- Bridging the Gaps: Utilizing Unlabeled Face Recognition Datasets to Boost Semi-Supervised Facial Expression Recognition [5.750927184237346]
We focus on utilizing large unlabeled Face Recognition (FR) datasets to boost semi-supervised FER.
Specifically, we first perform face reconstruction pre-training on large-scale facial images without annotations.
To further alleviate the scarcity of labeled and diverse images, we propose a Mixup-based data augmentation strategy.
arXiv Detail & Related papers (2024-10-23T07:26:19Z) - 15M Multimodal Facial Image-Text Dataset [5.552727861734425]
FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features.
We conducted a comprehensive analysis of image quality, text naturalness, text complexity, and text-image relevance to demonstrate the superiority of FaceCaption-15M.
arXiv Detail & Related papers (2024-07-11T14:00:14Z) - Self-Supervised Facial Representation Learning with Facial Region
Awareness [13.06996608324306]
Self-supervised pre-training has been proven to be effective in learning transferable representations that benefit various visual tasks.
Recent efforts toward this goal are limited to treating each face image as a whole.
We propose a novel self-supervised facial representation learning framework to learn consistent global and local facial representations.
arXiv Detail & Related papers (2024-03-04T15:48:56Z) - A Generalist FaceX via Learning Unified Facial Representation [77.74407008931486]
FaceX is a novel facial generalist model capable of handling diverse facial tasks simultaneously.
Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks.
arXiv Detail & Related papers (2023-12-31T17:41:48Z) - DiffFace: Diffusion-based Face Swapping with Facial Guidance [24.50570533781642]
We propose a diffusion-based face swapping framework for the first time, called DiffFace.
It is composed of training ID conditional DDPM, sampling with facial guidance, and a target-preserving blending.
DiffFace achieves better benefits such as training stability, high fidelity, diversity of the samples, and controllability.
arXiv Detail & Related papers (2022-12-27T02:51:46Z) - A Unified View of Masked Image Modeling [117.79456335844439]
Masked image modeling has demonstrated great potential to eliminate the label-hungry problem of training large-scale vision Transformers.
We introduce a simple yet effective method, termed as MaskDistill, which reconstructs normalized semantic features from teacher models at the masked positions.
Experimental results on image classification and semantic segmentation show that MaskDistill achieves comparable or superior performance than state-of-the-art methods.
arXiv Detail & Related papers (2022-10-19T14:59:18Z) - General Facial Representation Learning in a Visual-Linguistic Manner [45.92447707178299]
We introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.
We show that FaRL achieves better transfer performance compared with previous pre-trained models.
Our model surpasses the state-of-the-art methods on face analysis tasks including face parsing and face alignment.
arXiv Detail & Related papers (2021-12-06T15:22:05Z) - FedFace: Collaborative Learning of Face Recognition Model [66.84737075622421]
FedFace is a framework for collaborative learning of face recognition models.
It learns an accurate and generalizable face recognition model where the face images stored at each client are neither shared with other clients nor the central host.
Our code and pre-trained models will be publicly available.
arXiv Detail & Related papers (2021-04-07T09:25:32Z) - Semi-Siamese Training for Shallow Face Learning [78.7386209619276]
We introduce a novel training method named Semi-Siamese Training (SST)
A pair of Semi-Siamese networks constitute the forward propagation structure, and the training loss is computed with an updating gallery queue.
Our method is developed without extra-dependency, thus can be flexibly integrated with the existing loss functions and network architectures.
arXiv Detail & Related papers (2020-07-16T15:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.