Cross Architecture Distillation for Face Recognition
- URL: http://arxiv.org/abs/2306.14662v1
- Date: Mon, 26 Jun 2023 12:54:28 GMT
- Title: Cross Architecture Distillation for Face Recognition
- Authors: Weisong Zhao, Xiangyu Zhu, Zhixiang He, Xiao-Yu Zhang, Zhen Lei
- Abstract summary: We develop an Adaptable Prompting Teacher network (APT) that integrates prompts into the teacher, enabling it to manage distillation-specific knowledge.
Experiments on popular face benchmarks and two large-scale verification sets demonstrate the superiority of our method.
- Score: 49.55061794917994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have emerged as the superior choice for face recognition tasks,
but their insufficient platform acceleration hinders their application on
mobile devices. In contrast, Convolutional Neural Networks (CNNs) capitalize on
hardware-compatible acceleration libraries. Consequently, it has become
indispensable to preserve the distillation efficacy when transferring knowledge
from a Transformer-based teacher model to a CNN-based student model, known as
Cross-Architecture Knowledge Distillation (CAKD). Despite its potential, the
deployment of CAKD in face recognition encounters two challenges: 1) the
teacher and student share disparate spatial information for each pixel,
obstructing the alignment of feature space, and 2) the teacher network is not
trained in the role of a teacher, lacking proficiency in handling
distillation-specific knowledge. To surmount these two constraints, 1) we first
introduce a Unified Receptive Fields Mapping module (URFM) that maps pixel
features of the teacher and student into local features with unified receptive
fields, thereby synchronizing the pixel-wise spatial information of teacher and
student. Subsequently, 2) we develop an Adaptable Prompting Teacher network
(APT) that integrates prompts into the teacher, enabling it to manage
distillation-specific knowledge while preserving the model's discriminative
capacity. Extensive experiments on popular face benchmarks and two large-scale
verification sets demonstrate the superiority of our method.
Related papers
- Direct Distillation between Different Domains [97.39470334253163]
We propose a new one-stage method dubbed Direct Distillation between Different Domains" (4Ds)
We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge.
We then build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network.
arXiv Detail & Related papers (2024-01-12T02:48:51Z) - Distilling Efficient Vision Transformers from CNNs for Semantic
Segmentation [12.177329445930276]
We propose a novel CNN-to-ViT KD framework, dubbed C2VKD.
We first propose a novel visual-linguistic feature distillation (VLFD) module that explores efficient KD among the aligned visual and linguistic-compatible representations.
We then propose a pixel-wise decoupled distillation (PDD) module to supervise the student under the combination of labels and teacher's predictions from the decoupled target and non-target classes.
arXiv Detail & Related papers (2023-10-11T07:45:37Z) - Knowledge Distillation Layer that Lets the Student Decide [6.689381216751284]
We propose a learnable KD layer for the student which improves KD with two distinct abilities.
i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper.
arXiv Detail & Related papers (2023-09-06T09:05:03Z) - Learning Knowledge Representation with Meta Knowledge Distillation for
Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task.
Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z) - Impact of a DCT-driven Loss in Attention-based Knowledge-Distillation
for Scene Recognition [64.29650787243443]
We propose and analyse the use of a 2D frequency transform of the activation maps before transferring them.
This strategy enhances knowledge transferability in tasks such as scene recognition.
We publicly release the training and evaluation framework used along this paper at http://www.vpu.eps.uam.es/publications/DCTBasedKDForSceneRecognition.
arXiv Detail & Related papers (2022-05-04T11:05:18Z) - Distilling EEG Representations via Capsules for Affective Computing [14.67085109524245]
We propose a novel knowledge distillation pipeline to distill EEG representations via capsule-based architectures.
Our framework consistently enables student networks with different compression ratios to effectively learn from the teacher.
Our method achieves state-of-the-art results on one of the two datasets.
arXiv Detail & Related papers (2021-04-30T22:04:35Z) - Orderly Dual-Teacher Knowledge Distillation for Lightweight Human Pose
Estimation [1.0323063834827415]
We propose an orderly dual-teacher knowledge distillation (ODKD) framework, which consists of two teachers with different capabilities.
Taking dual-teacher together, an orderly learning strategy is proposed to promote knowledge absorbability.
Our proposed ODKD can improve the performance of different lightweight models by a large margin, and HRNet-W16 equipped with ODKD achieves state-of-the-art performance for lightweight human pose estimation.
arXiv Detail & Related papers (2021-04-21T08:50:36Z) - Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation.
The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks.
Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z) - ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for
Face Recognition [84.49978494275382]
Knowledge Distillation (KD) refers to transferring knowledge from a large model to a smaller one.
In this work, we focus on its application in face recognition.
We propose a novel method named ProxylessKD that directly optimize face recognition accuracy.
arXiv Detail & Related papers (2020-10-31T13:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.