Related papers: Enriching Knowledge Distillation with Intra-Class Contrastive Learning

Enriching Knowledge Distillation with Intra-Class Contrastive Learning

URL: http://arxiv.org/abs/2509.22053v1
Date: Fri, 26 Sep 2025 08:35:34 GMT
Title: Enriching Knowledge Distillation with Intra-Class Contrastive Learning
Authors: Hua Yuan, Ning Xu, Xin Geng, Yong Rui,
Abstract summary: We propose incorporating an intra-class contrastive loss during teacher training to enrich the intra-class information contained in soft labels.<n>It has been proved that the intra-class contrastive loss can enrich the intra-class diversity.
Score: 40.40889547725741
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Since the advent of knowledge distillation, much research has focused on how the soft labels generated by the teacher model can be utilized effectively. Existing studies points out that the implicit knowledge within soft labels originates from the multi-view structure present in the data. Feature variations within samples of the same class allow the student model to generalize better by learning diverse representations. However, in existing distillation methods, teacher models predominantly adhere to ground-truth labels as targets, without considering the diverse representations within the same class. Therefore, we propose incorporating an intra-class contrastive loss during teacher training to enrich the intra-class information contained in soft labels. In practice, we find that intra-class loss causes instability in training and slows convergence. To mitigate these issues, margin loss is integrated into intra-class contrastive learning to improve the training stability and convergence speed. Simultaneously, we theoretically analyze the impact of this loss on the intra-class distances and inter-class distances. It has been proved that the intra-class contrastive loss can enrich the intra-class diversity. Experimental results demonstrate the effectiveness of the proposed method.

Related papers

Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label Distillation [16.35331551561344]
Adversarial Training (AT) is widely recognized as an effective approach to enhance the adversarial robustness of Deep Neural Networks.<n>This paper explores the underlying factors of this problem and points out the smoothness degree of soft labels for different classes.<n>We propose Anti-Bias Soft Label Distillation (ABSLD) within the Knowledge Distillation framework to enhance the adversarial robust fairness.
arXiv Detail & Related papers (2025-06-10T09:20:34Z)
Weakly-Supervised Contrastive Learning for Imprecise Class Labels [50.57424331797865]
We introduce the concept of continuous semantic similarity'' to define positive and negative pairs.<n>We propose a graph-theoretic framework for weakly-supervised contrastive learning.<n>Our framework is highly versatile and can be applied to many weakly-supervised learning scenarios.
arXiv Detail & Related papers (2025-05-28T06:50:40Z)
Relation-Guided Adversarial Learning for Data-free Knowledge Transfer [9.069156418033174]
We introduce a novel Relation-Guided Adversarial Learning method with triplet losses.<n>Our method aims to promote both intra-class diversity and inter-class confusion of the generated samples.<n>RGAL shows significant improvement over previous state-of-the-art methods in accuracy and data efficiency.
arXiv Detail & Related papers (2024-12-16T02:11:02Z)
Partial Knowledge Distillation for Alleviating the Inherent Inter-Class Discrepancy in Federated Learning [2.395881636777087]
We observe that certain weak classes consistently exist even for class-balanced learning.<n>The inherent inter-class accuracy discrepancy can reach over 36.9% for federated learning on the FashionMNIST and CIFAR-10 datasets.<n>A partial knowledge distillation (PKD) method is proposed to improve the model's classification accuracy for weak classes.
arXiv Detail & Related papers (2024-11-23T01:16:46Z)
Understanding the Detrimental Class-level Effects of Data Augmentation [63.1733767714073]
achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. We present a framework for understanding how DA interacts with class-level learning dynamics. We show that simple class-conditional augmentation strategies improve performance on the negatively affected classes.
arXiv Detail & Related papers (2023-12-07T18:37:43Z)
Center Contrastive Loss for Metric Learning [8.433000039153407]
We propose a novel metric learning function called Center Contrastive Loss. It maintains a class-wise center bank and compares the category centers with the query data points using a contrastive loss. The proposed loss combines the advantages of both contrastive and classification methods.
arXiv Detail & Related papers (2023-08-01T11:22:51Z)
The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework [21.494759678807686]
We propose a new weakly supervised learning framework for knowledge distillation in video classification. Our approach leverages the concept of substage-based learning to distill knowledge based on the combination of student substages and the correlation of corresponding substages. Our proposed substage-based distillation approach has the potential to inform future research on label-efficient learning for video data.
arXiv Detail & Related papers (2023-07-11T12:10:42Z)
Adaptive Hierarchical Similarity Metric Learning with Noisy Labels [138.41576366096137]
We propose an Adaptive Hierarchical Similarity Metric Learning method. It considers two noise-insensitive information, textiti.e., class-wise divergence and sample-wise consistency. Our method achieves state-of-the-art performance compared with current deep metric learning approaches.
arXiv Detail & Related papers (2021-10-29T02:12:18Z)
Multi-head Knowledge Distillation for Model Compression [65.58705111863814]
We propose a simple-to-implement method using auxiliary classifiers at intermediate layers for matching features. We show that the proposed method outperforms prior relevant approaches presented in the literature.
arXiv Detail & Related papers (2020-12-05T00:49:14Z)
Spectrum-Guided Adversarial Disparity Learning [52.293230153385124]
We propose a novel end-to-end knowledge directed adversarial learning framework. It portrays the class-conditioned intraclass disparity using two competitive encoding distributions and learns the purified latent codes by denoising learned disparity. The experiments on four HAR benchmark datasets demonstrate the robustness and generalization of our proposed methods over a set of state-of-the-art.
arXiv Detail & Related papers (2020-07-14T05:46:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.