Related papers: Cross Architecture Distillation for Face Recognition

Cross Architecture Distillation for Face Recognition

URL: http://arxiv.org/abs/2306.14662v1
Date: Mon, 26 Jun 2023 12:54:28 GMT
Title: Cross Architecture Distillation for Face Recognition
Authors: Weisong Zhao, Xiangyu Zhu, Zhixiang He, Xiao-Yu Zhang, Zhen Lei
Abstract summary: We develop an Adaptable Prompting Teacher network (APT) that integrates prompts into the teacher, enabling it to manage distillation-specific knowledge. Experiments on popular face benchmarks and two large-scale verification sets demonstrate the superiority of our method.
Score: 49.55061794917994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have emerged as the superior choice for face recognition tasks, but their insufficient platform acceleration hinders their application on mobile devices. In contrast, Convolutional Neural Networks (CNNs) capitalize on hardware-compatible acceleration libraries. Consequently, it has become indispensable to preserve the distillation efficacy when transferring knowledge from a Transformer-based teacher model to a CNN-based student model, known as Cross-Architecture Knowledge Distillation (CAKD). Despite its potential, the deployment of CAKD in face recognition encounters two challenges: 1) the teacher and student share disparate spatial information for each pixel, obstructing the alignment of feature space, and 2) the teacher network is not trained in the role of a teacher, lacking proficiency in handling distillation-specific knowledge. To surmount these two constraints, 1) we first introduce a Unified Receptive Fields Mapping module (URFM) that maps pixel features of the teacher and student into local features with unified receptive fields, thereby synchronizing the pixel-wise spatial information of teacher and student. Subsequently, 2) we develop an Adaptable Prompting Teacher network (APT) that integrates prompts into the teacher, enabling it to manage distillation-specific knowledge while preserving the model's discriminative capacity. Extensive experiments on popular face benchmarks and two large-scale verification sets demonstrate the superiority of our method.

Related papers

ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation [2.7624021966289605]
ACAM-KD adapts to the student's evolving needs throughout the entire distillation process. It improves object detection performance by up to 1.4 mAP over the state-of-the-art. For semantic segmentation on Cityscapes, it boosts mIoU by 3.09 over the baseline.
arXiv Detail & Related papers (2025-03-08T18:51:53Z)
Direct Distillation between Different Domains [97.39470334253163]
We propose a new one-stage method dubbed Direct Distillation between Different Domains" (4Ds) We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge. We then build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network.
arXiv Detail & Related papers (2024-01-12T02:48:51Z)
Distilling Efficient Vision Transformers from CNNs for Semantic Segmentation [12.177329445930276]
We propose a novel CNN-to-ViT KD framework, dubbed C2VKD. We first propose a novel visual-linguistic feature distillation (VLFD) module that explores efficient KD among the aligned visual and linguistic-compatible representations. We then propose a pixel-wise decoupled distillation (PDD) module to supervise the student under the combination of labels and teacher's predictions from the decoupled target and non-target classes.
arXiv Detail & Related papers (2023-10-11T07:45:37Z)
Knowledge Distillation Layer that Lets the Student Decide [6.689381216751284]
We propose a learnable KD layer for the student which improves KD with two distinct abilities. i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper.
arXiv Detail & Related papers (2023-09-06T09:05:03Z)
Learning Knowledge Representation with Meta Knowledge Distillation for Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task. Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z)
Impact of a DCT-driven Loss in Attention-based Knowledge-Distillation for Scene Recognition [64.29650787243443]
We propose and analyse the use of a 2D frequency transform of the activation maps before transferring them. This strategy enhances knowledge transferability in tasks such as scene recognition. We publicly release the training and evaluation framework used along this paper at http://www.vpu.eps.uam.es/publications/DCTBasedKDForSceneRecognition.
arXiv Detail & Related papers (2022-05-04T11:05:18Z)
Distilling EEG Representations via Capsules for Affective Computing [14.67085109524245]
We propose a novel knowledge distillation pipeline to distill EEG representations via capsule-based architectures. Our framework consistently enables student networks with different compression ratios to effectively learn from the teacher. Our method achieves state-of-the-art results on one of the two datasets.
arXiv Detail & Related papers (2021-04-30T22:04:35Z)
Orderly Dual-Teacher Knowledge Distillation for Lightweight Human Pose Estimation [1.0323063834827415]
We propose an orderly dual-teacher knowledge distillation (ODKD) framework, which consists of two teachers with different capabilities. Taking dual-teacher together, an orderly learning strategy is proposed to promote knowledge absorbability. Our proposed ODKD can improve the performance of different lightweight models by a large margin, and HRNet-W16 equipped with ODKD achieves state-of-the-art performance for lightweight human pose estimation.
arXiv Detail & Related papers (2021-04-21T08:50:36Z)
Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation. The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks. Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z)
ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for Face Recognition [84.49978494275382]
Knowledge Distillation (KD) refers to transferring knowledge from a large model to a smaller one. In this work, we focus on its application in face recognition. We propose a novel method named ProxylessKD that directly optimize face recognition accuracy.
arXiv Detail & Related papers (2020-10-31T13:14:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.