ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for
Face Recognition
- URL: http://arxiv.org/abs/2011.00265v1
- Date: Sat, 31 Oct 2020 13:14:34 GMT
- Title: ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for
Face Recognition
- Authors: Weidong Shi, Guanghui Ren, Yunpeng Chen, Shuicheng Yan
- Abstract summary: Knowledge Distillation (KD) refers to transferring knowledge from a large model to a smaller one.
In this work, we focus on its application in face recognition.
We propose a novel method named ProxylessKD that directly optimize face recognition accuracy.
- Score: 84.49978494275382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge Distillation (KD) refers to transferring knowledge from a large
model to a smaller one, which is widely used to enhance model performance in
machine learning. It tries to align embedding spaces generated from the teacher
and the student model (i.e. to make images corresponding to the same semantics
share the same embedding across different models). In this work, we focus on
its application in face recognition. We observe that existing knowledge
distillation models optimize the proxy tasks that force the student to mimic
the teacher's behavior, instead of directly optimizing the face recognition
accuracy. Consequently, the obtained student models are not guaranteed to be
optimal on the target task or able to benefit from advanced constraints, such
as large margin constraints (e.g. margin-based softmax). We then propose a
novel method named ProxylessKD that directly optimizes face recognition
accuracy by inheriting the teacher's classifier as the student's classifier to
guide the student to learn discriminative embeddings in the teacher's embedding
space. The proposed ProxylessKD is very easy to implement and sufficiently
generic to be extended to other tasks beyond face recognition. We conduct
extensive experiments on standard face recognition benchmarks, and the results
demonstrate that ProxylessKD achieves superior performance over existing
knowledge distillation methods.
Related papers
- AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition [8.045296450065019]
Knowledge distillation aims at improving the performance of a compact student model by distilling the knowledge from a high-performing teacher model.
AdaDistill embeds the KD concept into the softmax loss by training the student using a margin penalty softmax loss with distilled class centers from the teacher.
Extensive experiments and ablation studies show that AdaDistill can enhance the discriminative learning capability of the student.
arXiv Detail & Related papers (2024-07-01T14:39:55Z) - Cross Architecture Distillation for Face Recognition [49.55061794917994]
We develop an Adaptable Prompting Teacher network (APT) that integrates prompts into the teacher, enabling it to manage distillation-specific knowledge.
Experiments on popular face benchmarks and two large-scale verification sets demonstrate the superiority of our method.
arXiv Detail & Related papers (2023-06-26T12:54:28Z) - Improving Knowledge Distillation via Regularizing Feature Norm and
Direction [16.98806338782858]
Knowledge distillation (KD) exploits a large well-trained model (i.e., teacher) to train a small student model on the same dataset for the same task.
Treating teacher features as knowledge, prevailing methods of knowledge distillation train student by aligning its features with the teacher's, e.g., by minimizing the KL-divergence between their logits or L2 distance between their intermediate features.
While it is natural to believe that better alignment of student features to the teacher better distills teacher knowledge, simply forcing this alignment does not directly contribute to the student's performance, e.g.
arXiv Detail & Related papers (2023-05-26T15:05:19Z) - Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge
Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation.
Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z) - Evaluation-oriented Knowledge Distillation for Deep Face Recognition [19.01023156168511]
We propose a novel Evaluation oriented KD method (EKD) for deep face recognition to directly reduce the performance gap between the teacher and student models during training.
EKD uses the commonly used evaluation metrics in face recognition, i.e., False Positive Rate (FPR) and True Positive Rate (TPR) as the performance indicator.
arXiv Detail & Related papers (2022-06-06T02:49:40Z) - CoupleFace: Relation Matters for Face Recognition Distillation [26.2626768462705]
We propose an effective face recognition distillation method called CoupleFace.
We first propose to mine the informative mutual relations, and then introduce the Relation-Aware Distillation (RAD) loss to transfer the mutual relation knowledge of the teacher model to the student model.
Based on our proposed CoupleFace, we have won the first place in the ICCV21 Masked Face Recognition Challenge (MS1M track)
arXiv Detail & Related papers (2022-04-12T03:25:42Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Privileged Knowledge Distillation for Online Action Detection [114.5213840651675]
Online Action Detection (OAD) in videos is proposed as a per-frame labeling task to address the real-time prediction tasks.
This paper presents a novel learning-with-privileged based framework for online action detection where the future frames only observable at the training stages are considered as a form of privileged information.
arXiv Detail & Related papers (2020-11-18T08:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.