Self-Supervised Facial Representation Learning with Facial Region
Awareness
- URL: http://arxiv.org/abs/2403.02138v1
- Date: Mon, 4 Mar 2024 15:48:56 GMT
- Title: Self-Supervised Facial Representation Learning with Facial Region
Awareness
- Authors: Zheng Gao, Ioannis Patras
- Abstract summary: Self-supervised pre-training has been proven to be effective in learning transferable representations that benefit various visual tasks.
Recent efforts toward this goal are limited to treating each face image as a whole.
We propose a novel self-supervised facial representation learning framework to learn consistent global and local facial representations.
- Score: 13.06996608324306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised pre-training has been proved to be effective in learning
transferable representations that benefit various visual tasks. This paper asks
this question: can self-supervised pre-training learn general facial
representations for various facial analysis tasks? Recent efforts toward this
goal are limited to treating each face image as a whole, i.e., learning
consistent facial representations at the image-level, which overlooks the
consistency of local facial representations (i.e., facial regions like eyes,
nose, etc). In this work, we make a first attempt to propose a novel
self-supervised facial representation learning framework to learn consistent
global and local facial representations, Facial Region Awareness (FRA).
Specifically, we explicitly enforce the consistency of facial regions by
matching the local facial representations across views, which are extracted
with learned heatmaps highlighting the facial regions. Inspired by the mask
prediction in supervised semantic segmentation, we obtain the heatmaps via
cosine similarity between the per-pixel projection of feature maps and facial
mask embeddings computed from learnable positional embeddings, which leverage
the attention mechanism to globally look up the facial image for facial
regions. To learn such heatmaps, we formulate the learning of facial mask
embeddings as a deep clustering problem by assigning the pixel features from
the feature maps to them. The transfer learning results on facial
classification and regression tasks show that our FRA outperforms previous
pre-trained models and more importantly, using ResNet as the unified backbone
for various tasks, our FRA achieves comparable or even better performance
compared with SOTA methods in facial analysis tasks.
Related papers
- LAFS: Landmark-based Facial Self-supervised Learning for Face
Recognition [37.4550614524874]
We focus on learning facial representations that can be adapted to train effective face recognition models.
We explore the learning strategy of unlabeled facial images through self-supervised pretraining.
Our method achieves significant improvement over the state-of-the-art on multiple face recognition benchmarks.
arXiv Detail & Related papers (2024-03-13T01:07:55Z) - A Generalist FaceX via Learning Unified Facial Representation [77.74407008931486]
FaceX is a novel facial generalist model capable of handling diverse facial tasks simultaneously.
Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks.
arXiv Detail & Related papers (2023-12-31T17:41:48Z) - Toward High Quality Facial Representation Learning [58.873356953627614]
We propose a self-supervised pre-training framework, called Mask Contrastive Face (MCF)
We use feature map of a pre-trained visual backbone as a supervision item and use a partially pre-trained decoder for mask image modeling.
Our model achieves 0.932 NME_diag$ for AFLW-19 face alignment and 93.96 F1 score for LaPa face parsing.
arXiv Detail & Related papers (2023-09-07T09:11:49Z) - SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial
Expression Recognition in the Wild [3.4798852684389963]
We propose a self-supervised simple facial landmark encoding (SimFLE) method that can learn effective encoding of facial landmarks.
We introduce novel FaceMAE module for this purpose.
Experimental results on several FER-W benchmarks prove that the proposed SimFLE is superior in facial landmark localization.
arXiv Detail & Related papers (2023-03-14T06:30:55Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers [57.1091606948826]
We propose a novel FER model, named Poker Face Vision Transformer or PF-ViT, to address these challenges.
PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face.
PF-ViT utilizes vanilla Vision Transformers, and its components are pre-trained as Masked Autoencoders on a large facial expression dataset.
arXiv Detail & Related papers (2022-07-22T13:39:06Z) - General Facial Representation Learning in a Visual-Linguistic Manner [45.92447707178299]
We introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.
We show that FaRL achieves better transfer performance compared with previous pre-trained models.
Our model surpasses the state-of-the-art methods on face analysis tasks including face parsing and face alignment.
arXiv Detail & Related papers (2021-12-06T15:22:05Z) - Learning Facial Representations from the Cycle-consistency of Face [23.23272327438177]
We introduce cycle-consistency in facial characteristics as free supervisory signal to learn facial representations from unlabeled facial images.
The learning is realized by superimposing the facial motion cycle-consistency and identity cycle-consistency constraints.
Our approach is competitive with those of existing methods, demonstrating the rich and unique information embedded in the disentangled representations.
arXiv Detail & Related papers (2021-08-07T11:30:35Z) - Pre-training strategies and datasets for facial representation learning [58.8289362536262]
We show how to find a universal face representation that can be adapted to several facial analysis tasks and datasets.
We systematically investigate two ways of large-scale representation learning applied to faces: supervised and unsupervised pre-training.
Our main two findings are: Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements.
arXiv Detail & Related papers (2021-03-30T17:57:25Z) - Spontaneous Emotion Recognition from Facial Thermal Images [0.0]
We analyze that a large number of tasks for facial image processing in thermal infrared images can be addressed with modern learning-based approaches.
We have used USTC-NVIE database for training of a number of machine learning algorithms for facial landmark localization.
arXiv Detail & Related papers (2020-12-13T05:55:19Z) - DotFAN: A Domain-transferred Face Augmentation Network for Pose and
Illumination Invariant Face Recognition [94.96686189033869]
We propose a 3D model-assisted domain-transferred face augmentation network (DotFAN)
DotFAN can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains.
Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity.
arXiv Detail & Related papers (2020-02-23T08:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.