Pre-training strategies and datasets for facial representation learning
- URL: http://arxiv.org/abs/2103.16554v1
- Date: Tue, 30 Mar 2021 17:57:25 GMT
- Title: Pre-training strategies and datasets for facial representation learning
- Authors: Adrian Bulat and Shiyang Cheng and Jing Yang and Andrew Garbett and
Enrique Sanchez and Georgios Tzimiropoulos
- Abstract summary: We show how to find a universal face representation that can be adapted to several facial analysis tasks and datasets.
We systematically investigate two ways of large-scale representation learning applied to faces: supervised and unsupervised pre-training.
Our main two findings are: Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements.
- Score: 58.8289362536262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What is the best way to learn a universal face representation? Recent work on
Deep Learning in the area of face analysis has focused on supervised learning
for specific tasks of interest (e.g. face recognition, facial landmark
localization etc.) but has overlooked the overarching question of how to find a
facial representation that can be readily adapted to several facial analysis
tasks and datasets. To this end, we make the following 4 contributions: (a) we
introduce, for the first time, a comprehensive evaluation benchmark for facial
representation learning consisting of 5 important face analysis tasks. (b) We
systematically investigate two ways of large-scale representation learning
applied to faces: supervised and unsupervised pre-training. Importantly, we
focus our evaluations on the case of few-shot facial learning. (c) We
investigate important properties of the training datasets including their size
and quality (labelled, unlabelled or even uncurated). (d) To draw our
conclusions, we conducted a very large number of experiments. Our main two
findings are: (1) Unsupervised pre-training on completely in-the-wild,
uncurated data provides consistent and, in some cases, significant accuracy
improvements for all facial tasks considered. (2) Many existing facial video
datasets seem to have a large amount of redundancy. We will release code,
pre-trained models and data to facilitate future research.
Related papers
- Self-Supervised Facial Representation Learning with Facial Region
Awareness [13.06996608324306]
Self-supervised pre-training has been proven to be effective in learning transferable representations that benefit various visual tasks.
Recent efforts toward this goal are limited to treating each face image as a whole.
We propose a novel self-supervised facial representation learning framework to learn consistent global and local facial representations.
arXiv Detail & Related papers (2024-03-04T15:48:56Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - Are Face Detection Models Biased? [69.68854430664399]
We investigate possible bias in the domain of face detection through facial region localization.
Most existing face detection datasets lack suitable annotation for such analysis.
We observe a high disparity in detection accuracies across gender and skin-tone, and interplay of confounding factors beyond demography.
arXiv Detail & Related papers (2022-11-07T14:27:55Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - General Facial Representation Learning in a Visual-Linguistic Manner [45.92447707178299]
We introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.
We show that FaRL achieves better transfer performance compared with previous pre-trained models.
Our model surpasses the state-of-the-art methods on face analysis tasks including face parsing and face alignment.
arXiv Detail & Related papers (2021-12-06T15:22:05Z) - Teacher-Student Training and Triplet Loss to Reduce the Effect of
Drastic Face Occlusion [15.44796695070395]
We show that convolutional neural networks (CNNs) trained on fully-visible faces exhibit very low performance levels.
While fine-tuning the deep learning models on occluded faces is extremely useful, we show that additional performance gains can be obtained by distilling knowledge from models trained on fully-visible faces.
Our main contribution consists in a novel approach for knowledge distillation based on triplet loss, which generalizes across models and tasks.
arXiv Detail & Related papers (2021-11-20T11:13:46Z) - Towards a Real-Time Facial Analysis System [13.649384403827359]
We present a system-level design of a real-time facial analysis system.
With a collection of deep neural networks for object detection, classification, and regression, the system recognizes age, gender, facial expression, and facial similarity for each person that appears in the camera view.
Results on common off-the-shelf architecture show that the system's accuracy is comparable to the state-of-the-art methods, and the recognition speed satisfies real-time requirements.
arXiv Detail & Related papers (2021-09-21T18:27:15Z) - FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in
the Wild [50.8865921538953]
We propose a method to explicitly incorporate facial semantics into age estimation.
We design a face parsing-based network to learn semantic information at different scales.
We show that our method consistently outperforms all existing age estimation methods.
arXiv Detail & Related papers (2021-06-21T14:31:32Z) - Learning to Augment Expressions for Few-shot Fine-grained Facial
Expression Recognition [98.83578105374535]
We present a novel Fine-grained Facial Expression Database - F2ED.
It includes more than 200k images with 54 facial expressions from 119 persons.
Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we evaluate several tasks of few-shot expression learning.
We propose a unified task-driven framework - Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize facial images.
arXiv Detail & Related papers (2020-01-17T03:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.