Related papers: General Facial Representation Learning in a Visual-Linguistic Manner

General Facial Representation Learning in a Visual-Linguistic Manner

URL: http://arxiv.org/abs/2112.03109v1
Date: Mon, 6 Dec 2021 15:22:05 GMT
Title: General Facial Representation Learning in a Visual-Linguistic Manner
Authors: Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen
Abstract summary: We introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner. We show that FaRL achieves better transfer performance compared with previous pre-trained models. Our model surpasses the state-of-the-art methods on face analysis tasks including face parsing and face alignment.
Score: 45.92447707178299
License: http://creativecommons.org/licenses/by/4.0/
Abstract: How to learn a universal facial representation that boosts all face analysis tasks? This paper takes one step toward this goal. In this paper, we study the transfer performance of pre-trained models on face analysis tasks and introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner. On one hand, the framework involves a contrastive loss to learn high-level semantic meaning from image-text pairs. On the other hand, we propose exploring low-level information simultaneously to further enhance the face representation, by adding a masked image modeling. We perform pre-training on LAION-FACE, a dataset containing large amount of face image-text pairs, and evaluate the representation capability on multiple downstream tasks. We show that FaRL achieves better transfer performance compared with previous pre-trained models. We also verify its superiority in the low-data regime. More importantly, our model surpasses the state-of-the-art methods on face analysis tasks including face parsing and face alignment.

Related papers

FaceInsight: A Multimodal Large Language Model for Face Perception [69.06084304620026]
We propose FaceInsight, a versatile face perception large language model (MLLM) that provides fine-grained facial information. Our approach introduces visual-textual alignment of facial knowledge to model both uncertain dependencies and deterministic relationships among facial information. Comprehensive experiments and analyses across three face perception tasks demonstrate that FaceInsight consistently outperforms nine compared MLLMs.
arXiv Detail & Related papers (2025-04-22T06:31:57Z)
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning [5.178801281905521]
We propose Face-LLaVA, a large language model for face-centered, in-context learning, including facial expression and attribute recognition. We first developed FaceInstruct-1M, a face-centered database for instruction tuning MLLMs for face processing. We then developed a novel face-specific visual encoder powered by Face-Region Guided Cross-Attention.
arXiv Detail & Related papers (2025-04-09T18:26:07Z)
OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration. We propose OSDFace, a novel one-step diffusion model for face restoration. Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
Face-MLLM: A Large Face Perception Model [53.9441375205716]
multimodal large language models (MLLMs) have achieved promising results on a wide range of vision-language tasks, but their ability to perceive and understand human faces is rarely explored. In this work, we comprehensively evaluate existing MLLMs on face perception tasks. Our model surpasses previous MLLMs on five famous face perception tasks.
arXiv Detail & Related papers (2024-10-28T04:19:32Z)
FaceXFormer: A Unified Transformer for Facial Analysis [59.94066615853198]
FaceXFormer is an end-to-end unified transformer model capable of performing ten facial analysis tasks. Tasks include face parsing, landmark detection, head pose estimation, attribute prediction, age, gender, and race estimation. We train FaceXFormer on ten diverse face perception datasets and evaluate it against both specialized and multi-task models.
arXiv Detail & Related papers (2024-03-19T17:58:04Z)
Self-Supervised Facial Representation Learning with Facial Region Awareness [13.06996608324306]
Self-supervised pre-training has been proven to be effective in learning transferable representations that benefit various visual tasks. Recent efforts toward this goal are limited to treating each face image as a whole. We propose a novel self-supervised facial representation learning framework to learn consistent global and local facial representations.
arXiv Detail & Related papers (2024-03-04T15:48:56Z)
A Generalist FaceX via Learning Unified Facial Representation [77.74407008931486]
FaceX is a novel facial generalist model capable of handling diverse facial tasks simultaneously. Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks.
arXiv Detail & Related papers (2023-12-31T17:41:48Z)
A Generative Framework for Self-Supervised Facial Representation Learning [18.094262972295702]
Self-supervised representation learning has gained increasing attention for strong generalization ability without relying on paired datasets. Self-supervised facial representation learning remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light. We propose LatentFace, a novel generative framework for self-supervised facial representations.
arXiv Detail & Related papers (2023-09-15T09:34:05Z)
Toward High Quality Facial Representation Learning [58.873356953627614]
We propose a self-supervised pre-training framework, called Mask Contrastive Face (MCF) We use feature map of a pre-trained visual backbone as a supervision item and use a partially pre-trained decoder for mask image modeling. Our model achieves 0.932 NME_diag$ for AFLW-19 face alignment and 93.96 F1 score for LaPa face parsing.
arXiv Detail & Related papers (2023-09-07T09:11:49Z)
Pose-disentangled Contrastive Learning for Self-supervised Facial Representation [12.677909048435408]
We propose a novel Pose-disentangled Contrastive Learning (PCL) method for general self-supervised facial representation. Our PCL first devises a pose-disentangled decoder (PDD), which disentangles the pose-related features from the face-aware features. We then introduce a pose-related contrastive learning scheme that learns pose-related information based on data augmentation of the same image.
arXiv Detail & Related papers (2022-11-24T09:30:51Z)
Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network with Graph Representation Learning [40.544844623958426]
We propose a novel Semantic-Driven Generative Adversarial Network to address the above issues. Considering that human faces have distinct spatial structures, we first inject class-wise semantic layouts into the generator. We construct two types of representational graphs via semantic parsing maps upon input faces, dubbed the IntrA-class Semantic Graph (IASG) and the InteR-class Structure Graph (IRSG)
arXiv Detail & Related papers (2022-01-05T13:14:14Z)
Pre-training strategies and datasets for facial representation learning [58.8289362536262]
We show how to find a universal face representation that can be adapted to several facial analysis tasks and datasets. We systematically investigate two ways of large-scale representation learning applied to faces: supervised and unsupervised pre-training. Our main two findings are: Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements.
arXiv Detail & Related papers (2021-03-30T17:57:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.