Related papers: Conditional Pseudo-Supervised Contrast for Data-Free Knowledge Distillation

Conditional Pseudo-Supervised Contrast for Data-Free Knowledge Distillation

URL: http://arxiv.org/abs/2510.03375v1
Date: Fri, 03 Oct 2025 13:34:19 GMT
Title: Conditional Pseudo-Supervised Contrast for Data-Free Knowledge Distillation
Authors: Renrong Shao, Wei Zhang, Jun wang,
Abstract summary: We propose a novel learning paradigm, i.e., conditional pseudo-supervised contrast for data-free knowledge distillation(CPSC-DFKD)<n>The primary innovations of CPSC-DFKD are: (1) introducing a conditional generative adversarial network to synthesize category-specific diverse images for pseudo-supervised learning, (2) improving the modules of the generator to distinguish the distributions of different categories, and (3) proposing pseudo-supervised contrastive learning based on teacher and student views to enhance diversity.
Score: 7.195870730342018
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data-free knowledge distillation~(DFKD) is an effective manner to solve model compression and transmission restrictions while retaining privacy protection, which has attracted extensive attention in recent years. Currently, the majority of existing methods utilize a generator to synthesize images to support the distillation. Although the current methods have achieved great success, there are still many issues to be explored. Firstly, the outstanding performance of supervised learning in deep learning drives us to explore a pseudo-supervised paradigm on DFKD. Secondly, current synthesized methods cannot distinguish the distributions of different categories of samples, thus producing ambiguous samples that may lead to an incorrect evaluation by the teacher. Besides, current methods cannot optimize the category-wise diversity samples, which will hinder the student model learning from diverse samples and further achieving better performance. In this paper, to address the above limitations, we propose a novel learning paradigm, i.e., conditional pseudo-supervised contrast for data-free knowledge distillation~(CPSC-DFKD). The primary innovations of CPSC-DFKD are: (1) introducing a conditional generative adversarial network to synthesize category-specific diverse images for pseudo-supervised learning, (2) improving the modules of the generator to distinguish the distributions of different categories, and (3) proposing pseudo-supervised contrastive learning based on teacher and student views to enhance diversity. Comprehensive experiments on three commonly-used datasets validate the performance lift of both the student and generator brought by CPSC-DFKD. The code is available at https://github.com/RoryShao/CPSC-DFKD.git

Related papers

BicKD: Bilateral Contrastive Knowledge Distillation [7.791534714823052]
Knowledge distillation (KD) is a machine learning framework that transfers knowledge from a teacher model to a student model.<n> vanilla KD has been the dominant approach in logit-based distillation.<n>We propose a simple yet effective methodology, bilateral contrastive knowledge distillation (BicKD)
arXiv Detail & Related papers (2026-02-01T14:54:34Z)
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only.<n> Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Relation-Guided Adversarial Learning for Data-free Knowledge Transfer [9.069156418033174]
We introduce a novel Relation-Guided Adversarial Learning method with triplet losses.<n>Our method aims to promote both intra-class diversity and inter-class confusion of the generated samples.<n>RGAL shows significant improvement over previous state-of-the-art methods in accuracy and data efficiency.
arXiv Detail & Related papers (2024-12-16T02:11:02Z)
Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition [58.41784639847413]
Human emotion is a complex phenomenon conveyed and perceived through facial expressions, vocal tones, body language, and physiological signals. In this paper, a multi-teacher PKD (MT-PKDOT) method with self-distillation is introduced to align diverse teacher representations before distilling them to the student. Results indicate that our proposed method can outperform SOTA PKD methods.
arXiv Detail & Related papers (2024-08-16T22:11:01Z)
Discriminative and Consistent Representation Distillation [6.24302896438145]
Discriminative and Consistent Distillation (DCD)<n>DCD employs a contrastive loss along with a consistency regularization to minimize the discrepancy between the distributions of teacher and student representations.<n>Our method introduces learnable temperature and bias parameters that adapt during training to balance these complementary objectives.
arXiv Detail & Related papers (2024-07-16T14:53:35Z)
Teacher as a Lenient Expert: Teacher-Agnostic Data-Free Knowledge Distillation [5.710971447109951]
We propose the teacher-agnostic data-free knowledge distillation (TA-DFKD) method. Our basic idea is to assign the teacher model a lenient expert role for evaluating samples, rather than a strict supervisor that enforces its class-prior on the generator. Our method successfully achieves both robustness and training stability across various teacher models, while outperforming the existing DFKD methods.
arXiv Detail & Related papers (2024-02-18T08:13:57Z)
Distilling Privileged Multimodal Information for Expression Recognition using Optimal Transport [46.91791643660991]
Deep learning models for multimodal expression recognition have reached remarkable performance in controlled laboratory environments. These models struggle in the wild because of the unavailability and quality of modalities used for training. In practice, only a subset of the training-time modalities may be available at test time. Learning with privileged information enables models to exploit data from additional modalities that are only available during training.
arXiv Detail & Related papers (2024-01-27T19:44:15Z)
Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions. Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z)
How and When Adversarial Robustness Transfers in Knowledge Distillation? [137.11016173468457]
This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in Knowledge distillation (KD) We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy. Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model.
arXiv Detail & Related papers (2021-10-22T21:30:53Z)
Heterogeneous Knowledge Distillation using Information Flow Modeling [82.83891707250926]
We propose a novel KD method that works by modeling the information flow through the various layers of the teacher model. The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process.
arXiv Detail & Related papers (2020-05-02T06:56:56Z)
Adversarial Feature Hallucination Networks for Few-Shot Learning [84.31660118264514]
Adversarial Feature Hallucination Networks (AFHN) is based on conditional Wasserstein Generative Adversarial networks (cWGAN) Two novel regularizers are incorporated into AFHN to encourage discriminability and diversity of the synthesized features.
arXiv Detail & Related papers (2020-03-30T02:43:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.