Evaluation-oriented Knowledge Distillation for Deep Face Recognition
- URL: http://arxiv.org/abs/2206.02325v1
- Date: Mon, 6 Jun 2022 02:49:40 GMT
- Title: Evaluation-oriented Knowledge Distillation for Deep Face Recognition
- Authors: Yuge Huang, Jiaxiang Wu, Xingkun Xu, Shouhong Ding
- Abstract summary: We propose a novel Evaluation oriented KD method (EKD) for deep face recognition to directly reduce the performance gap between the teacher and student models during training.
EKD uses the commonly used evaluation metrics in face recognition, i.e., False Positive Rate (FPR) and True Positive Rate (TPR) as the performance indicator.
- Score: 19.01023156168511
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation (KD) is a widely-used technique that utilizes large
networks to improve the performance of compact models. Previous KD approaches
usually aim to guide the student to mimic the teacher's behavior completely in
the representation space. However, such one-to-one corresponding constraints
may lead to inflexible knowledge transfer from the teacher to the student,
especially those with low model capacities. Inspired by the ultimate goal of KD
methods, we propose a novel Evaluation oriented KD method (EKD) for deep face
recognition to directly reduce the performance gap between the teacher and
student models during training. Specifically, we adopt the commonly used
evaluation metrics in face recognition, i.e., False Positive Rate (FPR) and
True Positive Rate (TPR) as the performance indicator. According to the
evaluation protocol, the critical pair relations that cause the TPR and FPR
difference between the teacher and student models are selected. Then, the
critical relations in the student are constrained to approximate the
corresponding ones in the teacher by a novel rank-based loss function, giving
more flexibility to the student with low capacity. Extensive experimental
results on popular benchmarks demonstrate the superiority of our EKD over
state-of-the-art competitors.
Related papers
- Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Linear Projections of Teacher Embeddings for Few-Class Distillation [14.99228980898161]
Knowledge Distillation (KD) has emerged as a promising approach for transferring knowledge from a larger, more complex teacher model to a smaller student model.
We introduce a novel method for distilling knowledge from the teacher's model representations, which we term Learning Embedding Linear Projections (LELP)
Our experimental evaluation on large-scale NLP benchmarks like Amazon Reviews and Sentiment140 demonstrate the LELP is consistently competitive with, and typically superior to, existing state-of-the-art distillation algorithms for binary and few-class problems.
arXiv Detail & Related papers (2024-09-30T16:07:34Z) - Relational Representation Distillation [6.24302896438145]
We introduce Representation Distillation (RRD) to explore and reinforce relationships between teacher and student models.
Inspired by self-supervised learning principles, it uses a relaxed contrastive loss that focuses on similarity than exact replication.
Our approach demonstrates superior performance on CIFAR-100 and ImageNet ILSVRC-2012 and sometimes even outperforms the teacher network when combined with KD.
arXiv Detail & Related papers (2024-07-16T14:56:13Z) - Comparative Knowledge Distillation [102.35425896967791]
Traditional Knowledge Distillation (KD) assumes readily available access to teacher models for frequent inference.
We propose Comparative Knowledge Distillation (CKD), which encourages student models to understand the nuanced differences in a teacher model's interpretations of samples.
CKD consistently outperforms state of the art data augmentation and KD techniques.
arXiv Detail & Related papers (2023-11-03T21:55:33Z) - Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge
Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation.
Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z) - How and When Adversarial Robustness Transfers in Knowledge Distillation? [137.11016173468457]
This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in Knowledge distillation (KD)
We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy.
Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model.
arXiv Detail & Related papers (2021-10-22T21:30:53Z) - Learning to Teach with Student Feedback [67.41261090761834]
Interactive Knowledge Distillation (IKD) allows the teacher to learn to teach from the feedback of the student.
IKD trains the teacher model to generate specific soft target at each training step for a certain student.
Joint optimization for both teacher and student is achieved by two iterative steps.
arXiv Detail & Related papers (2021-09-10T03:01:01Z) - ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for
Face Recognition [84.49978494275382]
Knowledge Distillation (KD) refers to transferring knowledge from a large model to a smaller one.
In this work, we focus on its application in face recognition.
We propose a novel method named ProxylessKD that directly optimize face recognition accuracy.
arXiv Detail & Related papers (2020-10-31T13:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.