General Facial Representation Learning in a Visual-Linguistic Manner
- URL: http://arxiv.org/abs/2112.03109v1
- Date: Mon, 6 Dec 2021 15:22:05 GMT
- Title: General Facial Representation Learning in a Visual-Linguistic Manner
- Authors: Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen,
Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen
- Abstract summary: We introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.
We show that FaRL achieves better transfer performance compared with previous pre-trained models.
Our model surpasses the state-of-the-art methods on face analysis tasks including face parsing and face alignment.
- Score: 45.92447707178299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How to learn a universal facial representation that boosts all face analysis
tasks? This paper takes one step toward this goal. In this paper, we study the
transfer performance of pre-trained models on face analysis tasks and introduce
a framework, called FaRL, for general Facial Representation Learning in a
visual-linguistic manner. On one hand, the framework involves a contrastive
loss to learn high-level semantic meaning from image-text pairs. On the other
hand, we propose exploring low-level information simultaneously to further
enhance the face representation, by adding a masked image modeling. We perform
pre-training on LAION-FACE, a dataset containing large amount of face
image-text pairs, and evaluate the representation capability on multiple
downstream tasks. We show that FaRL achieves better transfer performance
compared with previous pre-trained models. We also verify its superiority in
the low-data regime. More importantly, our model surpasses the state-of-the-art
methods on face analysis tasks including face parsing and face alignment.
Related papers
- Face-MLLM: A Large Face Perception Model [53.9441375205716]
multimodal large language models (MLLMs) have achieved promising results on a wide range of vision-language tasks, but their ability to perceive and understand human faces is rarely explored.
In this work, we comprehensively evaluate existing MLLMs on face perception tasks.
Our model surpasses previous MLLMs on five famous face perception tasks.
arXiv Detail & Related papers (2024-10-28T04:19:32Z) - Self-Supervised Facial Representation Learning with Facial Region
Awareness [13.06996608324306]
Self-supervised pre-training has been proven to be effective in learning transferable representations that benefit various visual tasks.
Recent efforts toward this goal are limited to treating each face image as a whole.
We propose a novel self-supervised facial representation learning framework to learn consistent global and local facial representations.
arXiv Detail & Related papers (2024-03-04T15:48:56Z) - A Generalist FaceX via Learning Unified Facial Representation [77.74407008931486]
FaceX is a novel facial generalist model capable of handling diverse facial tasks simultaneously.
Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks.
arXiv Detail & Related papers (2023-12-31T17:41:48Z) - A Generative Framework for Self-Supervised Facial Representation Learning [18.094262972295702]
Self-supervised representation learning has gained increasing attention for strong generalization ability without relying on paired datasets.
Self-supervised facial representation learning remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light.
We propose LatentFace, a novel generative framework for self-supervised facial representations.
arXiv Detail & Related papers (2023-09-15T09:34:05Z) - Toward High Quality Facial Representation Learning [58.873356953627614]
We propose a self-supervised pre-training framework, called Mask Contrastive Face (MCF)
We use feature map of a pre-trained visual backbone as a supervision item and use a partially pre-trained decoder for mask image modeling.
Our model achieves 0.932 NME_diag$ for AFLW-19 face alignment and 93.96 F1 score for LaPa face parsing.
arXiv Detail & Related papers (2023-09-07T09:11:49Z) - Pose-disentangled Contrastive Learning for Self-supervised Facial
Representation [12.677909048435408]
We propose a novel Pose-disentangled Contrastive Learning (PCL) method for general self-supervised facial representation.
Our PCL first devises a pose-disentangled decoder (PDD), which disentangles the pose-related features from the face-aware features.
We then introduce a pose-related contrastive learning scheme that learns pose-related information based on data augmentation of the same image.
arXiv Detail & Related papers (2022-11-24T09:30:51Z) - Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative
Adversarial Network with Graph Representation Learning [40.544844623958426]
We propose a novel Semantic-Driven Generative Adversarial Network to address the above issues.
Considering that human faces have distinct spatial structures, we first inject class-wise semantic layouts into the generator.
We construct two types of representational graphs via semantic parsing maps upon input faces, dubbed the IntrA-class Semantic Graph (IASG) and the InteR-class Structure Graph (IRSG)
arXiv Detail & Related papers (2022-01-05T13:14:14Z) - Pre-training strategies and datasets for facial representation learning [58.8289362536262]
We show how to find a universal face representation that can be adapted to several facial analysis tasks and datasets.
We systematically investigate two ways of large-scale representation learning applied to faces: supervised and unsupervised pre-training.
Our main two findings are: Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements.
arXiv Detail & Related papers (2021-03-30T17:57:25Z) - InterFaceGAN: Interpreting the Disentangled Face Representation Learned
by GANs [73.27299786083424]
We propose a framework called InterFaceGAN to interpret the disentangled face representation learned by state-of-the-art GAN models.
We first find that GANs learn various semantics in some linear subspaces of the latent space.
We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection.
arXiv Detail & Related papers (2020-05-18T18:01:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.