iCD: A Implicit Clustering Distillation Mathod for Structural Information Mining
- URL: http://arxiv.org/abs/2509.12553v1
- Date: Tue, 16 Sep 2025 01:16:13 GMT
- Title: iCD: A Implicit Clustering Distillation Mathod for Structural Information Mining
- Authors: Xiang Xue, Yatu Ji, Qing-dao-er-ji Ren, Bao Shi, Min Lu, Nier Wu, Xufei Zhuang, Haiteng Xu, Gan-qi-qi-ge Cha,
- Abstract summary: implicit Clustering Distillation (iCD) is a simple and effective method that mines and transfers interpretable structural knowledge from logits.<n>Experiments on benchmark datasets demonstrate the effectiveness of iCD across diverse teacher-student architectures.
- Score: 1.3573542141741506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Logit Knowledge Distillation has gained substantial research interest in recent years due to its simplicity and lack of requirement for intermediate feature alignment; however, it suffers from limited interpretability in its decision-making process. To address this, we propose implicit Clustering Distillation (iCD): a simple and effective method that mines and transfers interpretable structural knowledge from logits, without requiring ground-truth labels or feature-space alignment. iCD leverages Gram matrices over decoupled local logit representations to enable student models to learn latent semantic structural patterns. Extensive experiments on benchmark datasets demonstrate the effectiveness of iCD across diverse teacher-student architectures, with particularly strong performance in fine-grained classification tasks -- achieving a peak improvement of +5.08% over the baseline. The code is available at: https://github.com/maomaochongaa/iCD.
Related papers
- Heterogeneous Complementary Distillation [16.315256873831064]
Heterogeneous Complementary Distillation (HCD) integrates complementary teacher and student features to align representations in shared logits.<n> experiments on the CIFAR-100, Fine-grained (e.g., CUB200) and ImageNet-1K datasets demonstrate that HCD outperforms state-of-the-art KD methods.
arXiv Detail & Related papers (2025-11-14T04:06:33Z) - SRKD: Towards Efficient 3D Point Cloud Segmentation via Structure- and Relation-aware Knowledge Distillation [25.38025028623991]
3D point cloud segmentation faces practical challenges due to the computational complexity and deployment limitations of large-scale transformer-based models.<n>We propose a novel Structure- and Relation-aware Knowledge Distillation framework, named SRKD, that transfers rich geometric and semantic knowledge from a large frozen teacher model to a lightweight student model.<n>Our method achieves significantly reduced model complexity, demonstrating its effectiveness and efficiency in real-world deployment scenarios.
arXiv Detail & Related papers (2025-06-16T07:32:58Z) - Adversarial Curriculum Graph-Free Knowledge Distillation for Graph Neural Networks [61.608453110751206]
We propose a fast and high-quality data-free knowledge distillation approach for graph neural networks.<n>The proposed graph-free KD method (ACGKD) significantly reduces the spatial complexity of pseudo-graphs.<n>ACGKD eliminates the dimensional ambiguity between the student and teacher models by increasing the student's dimensions.
arXiv Detail & Related papers (2025-04-01T08:44:27Z) - Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes.<n>Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes.<n>We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z) - Attention-guided Feature Distillation for Semantic Segmentation [8.344263189293578]
This paper showcases the efficacy of a simple yet powerful method for utilizing refined feature maps to transfer attention.<n>The proposed Attention-guided Feature Distillation (AttnFD) method, employs the Convolutional Block Attention Module (CBAM)<n>It achieves state-of-the-art results in terms of improving the mean Intersection over Union (mIoU) of the student network on the PascalVoc 2012, Cityscapes, COCO, and CamVid datasets.
arXiv Detail & Related papers (2024-03-08T16:57:47Z) - Distilling Privileged Multimodal Information for Expression Recognition using Optimal Transport [46.91791643660991]
Deep learning models for multimodal expression recognition have reached remarkable performance in controlled laboratory environments.
These models struggle in the wild because of the unavailability and quality of modalities used for training.
In practice, only a subset of the training-time modalities may be available at test time.
Learning with privileged information enables models to exploit data from additional modalities that are only available during training.
arXiv Detail & Related papers (2024-01-27T19:44:15Z) - Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning [47.64252639582435]
We focus on semi-supervised continual learning (SSCL), where the model progressively learns from partially labeled data with unknown categories.<n>We propose a novel approach called Dynamic Sub-Graph Distillation (DSGD) for semi-supervised continual learning.
arXiv Detail & Related papers (2023-12-27T04:40:12Z) - One-for-All: Bridge the Gap Between Heterogeneous Architectures in
Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme.
Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family.
We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z) - Improving Knowledge Distillation via Regularizing Feature Norm and
Direction [16.98806338782858]
Knowledge distillation (KD) exploits a large well-trained model (i.e., teacher) to train a small student model on the same dataset for the same task.
Treating teacher features as knowledge, prevailing methods of knowledge distillation train student by aligning its features with the teacher's, e.g., by minimizing the KL-divergence between their logits or L2 distance between their intermediate features.
While it is natural to believe that better alignment of student features to the teacher better distills teacher knowledge, simply forcing this alignment does not directly contribute to the student's performance, e.g.
arXiv Detail & Related papers (2023-05-26T15:05:19Z) - Dynamic Conceptional Contrastive Learning for Generalized Category
Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data.
Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories.
One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data.
We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z) - Learning from Mistakes: Self-Regularizing Hierarchical Representations
in Point Cloud Semantic Segmentation [15.353256018248103]
LiDAR semantic segmentation has gained attention to accomplish fine-grained scene understanding.
We present a coarse-to-fine setup that LEArns from classification mistaKes (LEAK) derived from a standard model.
Our LEAK approach is very general and can be seamlessly applied on top of any segmentation architecture.
arXiv Detail & Related papers (2023-01-26T14:52:30Z) - Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student.
At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL)
Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.