General Facial Representation Learning in a Visual-Linguistic Manner
- URL: http://arxiv.org/abs/2112.03109v1
- Date: Mon, 6 Dec 2021 15:22:05 GMT
- Title: General Facial Representation Learning in a Visual-Linguistic Manner
- Authors: Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen,
Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen
- Abstract summary: We introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.
We show that FaRL achieves better transfer performance compared with previous pre-trained models.
Our model surpasses the state-of-the-art methods on face analysis tasks including face parsing and face alignment.
- Score: 45.92447707178299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How to learn a universal facial representation that boosts all face analysis
tasks? This paper takes one step toward this goal. In this paper, we study the
transfer performance of pre-trained models on face analysis tasks and introduce
a framework, called FaRL, for general Facial Representation Learning in a
visual-linguistic manner. On one hand, the framework involves a contrastive
loss to learn high-level semantic meaning from image-text pairs. On the other
hand, we propose exploring low-level information simultaneously to further
enhance the face representation, by adding a masked image modeling. We perform
pre-training on LAION-FACE, a dataset containing large amount of face
image-text pairs, and evaluate the representation capability on multiple
downstream tasks. We show that FaRL achieves better transfer performance
compared with previous pre-trained models. We also verify its superiority in
the low-data regime. More importantly, our model surpasses the state-of-the-art
methods on face analysis tasks including face parsing and face alignment.
Related papers
- FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning [27.34249750803211]
We propose a self-supervised pretraining framework to learn fundamental representations of real face images.
Our model transfers better than supervised pretraining, visual and facial self-supervised learning arts, and even outperforms task-specialized SOTA methods.
arXiv Detail & Related papers (2024-12-16T17:58:45Z) - OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.
We propose OSDFace, a novel one-step diffusion model for face restoration.
Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - FaceXFormer: A Unified Transformer for Facial Analysis [59.94066615853198]
FaceXFormer is an end-to-end unified transformer model capable of performing nine facial analysis tasks.
These tasks include face parsing, landmark detection, head pose estimation, attribute prediction, and estimation of age, gender, race, expression, and face visibility.
We propose a novel parameter-efficient decoder, FaceX, which jointly processes face and task tokens, thereby learning generalized and robust face representations.
arXiv Detail & Related papers (2024-03-19T17:58:04Z) - Self-Supervised Facial Representation Learning with Facial Region
Awareness [13.06996608324306]
Self-supervised pre-training has been proven to be effective in learning transferable representations that benefit various visual tasks.
Recent efforts toward this goal are limited to treating each face image as a whole.
We propose a novel self-supervised facial representation learning framework to learn consistent global and local facial representations.
arXiv Detail & Related papers (2024-03-04T15:48:56Z) - A Generalist FaceX via Learning Unified Facial Representation [77.74407008931486]
FaceX is a novel facial generalist model capable of handling diverse facial tasks simultaneously.
Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks.
arXiv Detail & Related papers (2023-12-31T17:41:48Z) - Toward High Quality Facial Representation Learning [58.873356953627614]
We propose a self-supervised pre-training framework, called Mask Contrastive Face (MCF)
We use feature map of a pre-trained visual backbone as a supervision item and use a partially pre-trained decoder for mask image modeling.
Our model achieves 0.932 NME_diag$ for AFLW-19 face alignment and 93.96 F1 score for LaPa face parsing.
arXiv Detail & Related papers (2023-09-07T09:11:49Z) - Pose-disentangled Contrastive Learning for Self-supervised Facial
Representation [12.677909048435408]
We propose a novel Pose-disentangled Contrastive Learning (PCL) method for general self-supervised facial representation.
Our PCL first devises a pose-disentangled decoder (PDD), which disentangles the pose-related features from the face-aware features.
We then introduce a pose-related contrastive learning scheme that learns pose-related information based on data augmentation of the same image.
arXiv Detail & Related papers (2022-11-24T09:30:51Z) - Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative
Adversarial Network with Graph Representation Learning [40.544844623958426]
We propose a novel Semantic-Driven Generative Adversarial Network to address the above issues.
Considering that human faces have distinct spatial structures, we first inject class-wise semantic layouts into the generator.
We construct two types of representational graphs via semantic parsing maps upon input faces, dubbed the IntrA-class Semantic Graph (IASG) and the InteR-class Structure Graph (IRSG)
arXiv Detail & Related papers (2022-01-05T13:14:14Z) - Pre-training strategies and datasets for facial representation learning [58.8289362536262]
We show how to find a universal face representation that can be adapted to several facial analysis tasks and datasets.
We systematically investigate two ways of large-scale representation learning applied to faces: supervised and unsupervised pre-training.
Our main two findings are: Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements.
arXiv Detail & Related papers (2021-03-30T17:57:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.