Teacher-Student Training and Triplet Loss to Reduce the Effect of
Drastic Face Occlusion
- URL: http://arxiv.org/abs/2111.10561v1
- Date: Sat, 20 Nov 2021 11:13:46 GMT
- Title: Teacher-Student Training and Triplet Loss to Reduce the Effect of
Drastic Face Occlusion
- Authors: Mariana-Iuliana Georgescu, Georgian Duta, Radu Tudor Ionescu
- Abstract summary: We show that convolutional neural networks (CNNs) trained on fully-visible faces exhibit very low performance levels.
While fine-tuning the deep learning models on occluded faces is extremely useful, we show that additional performance gains can be obtained by distilling knowledge from models trained on fully-visible faces.
Our main contribution consists in a novel approach for knowledge distillation based on triplet loss, which generalizes across models and tasks.
- Score: 15.44796695070395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a series of recognition tasks in two realistic scenarios requiring
the analysis of faces under strong occlusion. On the one hand, we aim to
recognize facial expressions of people wearing Virtual Reality (VR) headsets.
On the other hand, we aim to estimate the age and identify the gender of people
wearing surgical masks. For all these tasks, the common ground is that half of
the face is occluded. In this challenging setting, we show that convolutional
neural networks (CNNs) trained on fully-visible faces exhibit very low
performance levels. While fine-tuning the deep learning models on occluded
faces is extremely useful, we show that additional performance gains can be
obtained by distilling knowledge from models trained on fully-visible faces. To
this end, we study two knowledge distillation methods, one based on
teacher-student training and one based on triplet loss. Our main contribution
consists in a novel approach for knowledge distillation based on triplet loss,
which generalizes across models and tasks. Furthermore, we consider combining
distilled models learned through conventional teacher-student training or
through our novel teacher-student training based on triplet loss. We provide
empirical evidence showing that, in most cases, both individual and combined
knowledge distillation methods bring statistically significant performance
improvements. We conduct experiments with three different neural models (VGG-f,
VGG-face, ResNet-50) on various tasks (facial expression recognition, gender
recognition, age estimation), showing consistent improvements regardless of the
model or task.
Related papers
- RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models [79.65289816077629]
We present FitDiff, a diffusion-based 3D facial avatar generative model.
Our model accurately generates relightable facial avatars, utilizing an identity embedding extracted from an "in-the-wild" 2D facial image.
Being the first 3D LDM conditioned on face recognition embeddings, FitDiff reconstructs relightable human avatars, that can be used as-is in common rendering engines.
arXiv Detail & Related papers (2023-12-07T17:35:49Z) - Effective Adapter for Face Recognition in the Wild [72.75516495170199]
We tackle the challenge of face recognition in the wild, where images often suffer from low quality and real-world distortions.
Traditional approaches-either training models directly on degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective.
We propose an effective adapter for augmenting existing face recognition models trained on high-quality facial datasets.
arXiv Detail & Related papers (2023-12-04T08:55:46Z) - A Generative Framework for Self-Supervised Facial Representation Learning [18.094262972295702]
Self-supervised representation learning has gained increasing attention for strong generalization ability without relying on paired datasets.
Self-supervised facial representation learning remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light.
We propose LatentFace, a novel generative framework for self-supervised facial representations.
arXiv Detail & Related papers (2023-09-15T09:34:05Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - CoupleFace: Relation Matters for Face Recognition Distillation [26.2626768462705]
We propose an effective face recognition distillation method called CoupleFace.
We first propose to mine the informative mutual relations, and then introduce the Relation-Aware Distillation (RAD) loss to transfer the mutual relation knowledge of the teacher model to the student model.
Based on our proposed CoupleFace, we have won the first place in the ICCV21 Masked Face Recognition Challenge (MS1M track)
arXiv Detail & Related papers (2022-04-12T03:25:42Z) - Pre-training strategies and datasets for facial representation learning [58.8289362536262]
We show how to find a universal face representation that can be adapted to several facial analysis tasks and datasets.
We systematically investigate two ways of large-scale representation learning applied to faces: supervised and unsupervised pre-training.
Our main two findings are: Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements.
arXiv Detail & Related papers (2021-03-30T17:57:25Z) - A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task.
We ground our intuition on the observation that often faces images are acquired at different resolutions.
To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z) - Teacher-Student Training and Triplet Loss for Facial Expression
Recognition under Occlusion [29.639941810500638]
We are interested in cases where 50% of the face is occluded, e.g. when the subject wears a Virtual Reality (VR) headset.
Previous studies show that pre-training convolutional neural networks (CNNs) on fully-visible faces improves the accuracy.
We propose to employ knowledge distillation to achieve further improvements.
arXiv Detail & Related papers (2020-08-03T16:41:19Z) - Learning to Augment Expressions for Few-shot Fine-grained Facial
Expression Recognition [98.83578105374535]
We present a novel Fine-grained Facial Expression Database - F2ED.
It includes more than 200k images with 54 facial expressions from 119 persons.
Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we evaluate several tasks of few-shot expression learning.
We propose a unified task-driven framework - Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize facial images.
arXiv Detail & Related papers (2020-01-17T03:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.