Teacher-Student Training and Triplet Loss for Facial Expression
Recognition under Occlusion
- URL: http://arxiv.org/abs/2008.01003v2
- Date: Thu, 25 Feb 2021 18:54:30 GMT
- Title: Teacher-Student Training and Triplet Loss for Facial Expression
Recognition under Occlusion
- Authors: Mariana-Iuliana Georgescu, Radu Tudor Ionescu
- Abstract summary: We are interested in cases where 50% of the face is occluded, e.g. when the subject wears a Virtual Reality (VR) headset.
Previous studies show that pre-training convolutional neural networks (CNNs) on fully-visible faces improves the accuracy.
We propose to employ knowledge distillation to achieve further improvements.
- Score: 29.639941810500638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the task of facial expression recognition under
strong occlusion. We are particularly interested in cases where 50% of the face
is occluded, e.g. when the subject wears a Virtual Reality (VR) headset. While
previous studies show that pre-training convolutional neural networks (CNNs) on
fully-visible (non-occluded) faces improves the accuracy, we propose to employ
knowledge distillation to achieve further improvements. First of all, we employ
the classic teacher-student training strategy, in which the teacher is a CNN
trained on fully-visible faces and the student is a CNN trained on occluded
faces. Second of all, we propose a new approach for knowledge distillation
based on triplet loss. During training, the goal is to reduce the distance
between an anchor embedding, produced by a student CNN that takes occluded
faces as input, and a positive embedding (from the same class as the anchor),
produced by a teacher CNN trained on fully-visible faces, so that it becomes
smaller than the distance between the anchor and a negative embedding (from a
different class than the anchor), produced by the student CNN. Third of all, we
propose to combine the distilled embeddings obtained through the classic
teacher-student strategy and our novel teacher-student strategy based on
triplet loss into a single embedding vector. We conduct experiments on two
benchmarks, FER+ and AffectNet, with two CNN architectures, VGG-f and VGG-face,
showing that knowledge distillation can bring significant improvements over the
state-of-the-art methods designed for occluded faces in the VR setting.
Related papers
- Distilling Efficient Vision Transformers from CNNs for Semantic
Segmentation [12.177329445930276]
We propose a novel CNN-to-ViT KD framework, dubbed C2VKD.
We first propose a novel visual-linguistic feature distillation (VLFD) module that explores efficient KD among the aligned visual and linguistic-compatible representations.
We then propose a pixel-wise decoupled distillation (PDD) module to supervise the student under the combination of labels and teacher's predictions from the decoupled target and non-target classes.
arXiv Detail & Related papers (2023-10-11T07:45:37Z) - Cross Architecture Distillation for Face Recognition [49.55061794917994]
We develop an Adaptable Prompting Teacher network (APT) that integrates prompts into the teacher, enabling it to manage distillation-specific knowledge.
Experiments on popular face benchmarks and two large-scale verification sets demonstrate the superiority of our method.
arXiv Detail & Related papers (2023-06-26T12:54:28Z) - Triplet Knowledge Distillation [73.39109022280878]
In Knowledge Distillation, the teacher is generally much larger than the student, making the solution of the teacher likely to be difficult for the student to learn.
To ease the mimicking difficulty, we introduce a triplet knowledge distillation mechanism named TriKD.
arXiv Detail & Related papers (2023-05-25T12:12:31Z) - UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation [48.49860868061573]
Recent neural implicit representations (NIRs) have achieved great success in the tasks of 3D reconstruction and novel view synthesis.
They require the images of a scene from different camera views to be available for one-time training.
This is expensive especially for scenarios with large-scale scenes and limited data storage.
We design a student-teacher framework to mitigate the catastrophic problem.
arXiv Detail & Related papers (2022-12-21T11:43:20Z) - CoupleFace: Relation Matters for Face Recognition Distillation [26.2626768462705]
We propose an effective face recognition distillation method called CoupleFace.
We first propose to mine the informative mutual relations, and then introduce the Relation-Aware Distillation (RAD) loss to transfer the mutual relation knowledge of the teacher model to the student model.
Based on our proposed CoupleFace, we have won the first place in the ICCV21 Masked Face Recognition Challenge (MS1M track)
arXiv Detail & Related papers (2022-04-12T03:25:42Z) - Teacher-Student Training and Triplet Loss to Reduce the Effect of
Drastic Face Occlusion [15.44796695070395]
We show that convolutional neural networks (CNNs) trained on fully-visible faces exhibit very low performance levels.
While fine-tuning the deep learning models on occluded faces is extremely useful, we show that additional performance gains can be obtained by distilling knowledge from models trained on fully-visible faces.
Our main contribution consists in a novel approach for knowledge distillation based on triplet loss, which generalizes across models and tasks.
arXiv Detail & Related papers (2021-11-20T11:13:46Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.