CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
- URL: http://arxiv.org/abs/2404.14109v2
- Date: Tue, 25 Mar 2025 06:36:10 GMT
- Title: CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
- Authors: Wencheng Zhu, Xin Zhou, Pengfei Zhu, Yu Wang, Qinghua Hu,
- Abstract summary: We propose a contrastive knowledge distillation framework that achieves sample-wise logit alignment while preserving semantic consistency.<n>Our approach transfers "dark knowledge" through teacher-student contrastive alignment at the sample level.<n>We conduct comprehensive experiments across three benchmark datasets, including the CIFAR-100, ImageNet-1K, and MS COCO datasets.
- Score: 48.99488315273868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a simple yet effective contrastive knowledge distillation framework that achieves sample-wise logit alignment while preserving semantic consistency. Conventional knowledge distillation approaches exhibit over-reliance on feature similarity per sample, which risks overfitting, and contrastive approaches focus on inter-class discrimination at the expense of intra-sample semantic relationships. Our approach transfers "dark knowledge" through teacher-student contrastive alignment at the sample level. Specifically, our method first enforces intra-sample alignment by directly minimizing teacher-student logit discrepancies within individual samples. Then, we utilize inter-sample contrasts to preserve semantic dissimilarities across samples. By redefining positive pairs as aligned teacher-student logits from identical samples and negative pairs as cross-sample logit combinations, we reformulate these dual constraints into an InfoNCE loss framework, reducing computational complexity lower than sample squares while eliminating dependencies on temperature parameters and large batch sizes. We conduct comprehensive experiments across three benchmark datasets, including the CIFAR-100, ImageNet-1K, and MS COCO datasets, and experimental results clearly confirm the effectiveness of the proposed method on image classification, object detection, and instance segmentation tasks.
Related papers
- Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification [1.292108130501585]
We propose a clustering-driven feature fine-tuning method (HC-FT) to enhance the performance of multiple instance learning.
The proposed method is evaluated on both CAMELYON16 and BRACS datasets, achieving an AUC of 97.13% and 85.85%, respectively.
arXiv Detail & Related papers (2024-06-02T08:53:45Z) - Pairwise Similarity Distribution Clustering for Noisy Label Learning [0.0]
Noisy label learning aims to train deep neural networks using a large amount of samples with noisy labels.
We propose a simple yet effective sample selection algorithm to divide the training samples into one clean set and another noisy set.
Experimental results on various benchmark datasets, such as CIFAR-10, CIFAR-100 and Clothing1M, demonstrate significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-04-02T11:30:22Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Additional Positive Enables Better Representation Learning for Medical
Images [17.787804928943057]
This paper presents a new way to identify additional positive pairs for BYOL, a state-of-the-art (SOTA) self-supervised learning framework.
For each image, we select the most similar sample from other images as the additional positive and pull their features together with BYOL loss.
Experimental results on two public medical datasets demonstrate that the proposed method can improve the classification performance.
arXiv Detail & Related papers (2023-05-31T18:37:02Z) - CADet: Fully Self-Supervised Out-Of-Distribution Detection With Contrastive Learning [10.876763955414576]
This work explores the use of self-supervised contrastive learning to the simultaneous detection of two types of OOD samples.
First, we pair self-supervised contrastive learning with the maximum mean discrepancy (MMD) two-sample test.
Motivated by this success, we introduce CADet, a novel method for OOD detection of single samples.
arXiv Detail & Related papers (2022-10-04T17:02:37Z) - Hierarchical Semi-Supervised Contrastive Learning for
Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution.
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z) - Contrastive Principal Component Learning: Modeling Similarity by
Augmentation Overlap [50.48888534815361]
We propose a novel Contrastive Principal Component Learning (CPCL) method composed of a contrastive-like loss and an on-the-fly projection loss.
By CPCL, the learned low-dimensional embeddings theoretically preserve the similarity of augmentation distribution between samples.
arXiv Detail & Related papers (2022-06-01T13:03:58Z) - AdaPT-GMM: Powerful and robust covariate-assisted multiple testing [0.7614628596146599]
We propose a new empirical Bayes method for co-assisted multiple testing with false discovery rate (FDR) control.
Our method refines the adaptive p-value thresholding (AdaPT) procedure by generalizing its masking scheme.
We show in extensive simulations and real data examples that our new method, which we call AdaPT-GMM, consistently delivers high power.
arXiv Detail & Related papers (2021-06-30T05:06:18Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Similarity Transfer for Knowledge Distillation [25.042405967561212]
Knowledge distillation is a popular paradigm for learning portable neural networks by transferring the knowledge from a large model into a smaller one.
We propose a novel method called similarity transfer for knowledge distillation (STKD), which aims to fully utilize the similarities between categories of multiple samples.
It shows that STKD substantially has outperformed the vanilla knowledge distillation and has achieved superior accuracy over the state-of-the-art knowledge distillation methods.
arXiv Detail & Related papers (2021-03-18T06:54:59Z) - Doubly Contrastive Deep Clustering [135.7001508427597]
We present a novel Doubly Contrastive Deep Clustering (DCDC) framework, which constructs contrastive loss over both sample and class views.
Specifically, for the sample view, we set the class distribution of the original sample and its augmented version as positive sample pairs.
For the class view, we build the positive and negative pairs from the sample distribution of the class.
In this way, two contrastive losses successfully constrain the clustering results of mini-batch samples in both sample and class level.
arXiv Detail & Related papers (2021-03-09T15:15:32Z) - Center-wise Local Image Mixture For Contrastive Representation Learning [37.806687971373954]
Contrastive learning based on instance discrimination trains model to discriminate different transformations of the anchor sample from other samples.
This paper proposes a new kind of contrastive learning method, named CLIM, which uses positives from other samples in the dataset.
We reach 75.5% top-1 accuracy with linear evaluation over ResNet-50, and 59.3% top-1 accuracy when fine-tuned with only 1% labels.
arXiv Detail & Related papers (2020-11-05T08:20:31Z) - Conditional Negative Sampling for Contrastive Learning of Visual
Representations [19.136685699971864]
We show that choosing difficult negatives, or those more similar to the current instance, can yield stronger representations.
We introduce a family of mutual information estimators that sample negatives conditionally -- in a "ring" around each positive.
We prove that these estimators lower-bound mutual information, with higher bias but lower variance than NCE.
arXiv Detail & Related papers (2020-10-05T14:17:32Z) - CSI: Novelty Detection via Contrastive Learning on Distributionally
Shifted Instances [77.28192419848901]
We propose a simple, yet effective method named contrasting shifted instances (CSI)
In addition to contrasting a given sample with other instances as in conventional contrastive learning methods, our training scheme contrasts the sample with distributionally-shifted augmentations of itself.
Our experiments demonstrate the superiority of our method under various novelty detection scenarios.
arXiv Detail & Related papers (2020-07-16T08:32:56Z) - Almost-Matching-Exactly for Treatment Effect Estimation under Network
Interference [73.23326654892963]
We propose a matching method that recovers direct treatment effects from randomized experiments where units are connected in an observed network.
Our method matches units almost exactly on counts of unique subgraphs within their neighborhood graphs.
arXiv Detail & Related papers (2020-03-02T15:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.