Related papers: Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification

Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification

URL: http://arxiv.org/abs/2511.18826v1
Date: Mon, 24 Nov 2025 07:02:22 GMT
Title: Uncertainty-Aware Dual-Student Knowledge Distillation for Efficient Image Classification
Authors: Aakash Gore, Anoushka Dey, Aryan Mishra,
Abstract summary: This paper proposes an uncertainty-aware dual-student knowledge distillation framework.<n>We introduce a peer-learning mechanism where two heterogeneous student architectures learn collaboratively from both the teacher network and each other.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge distillation has emerged as a powerful technique for model compression, enabling the transfer of knowledge from large teacher networks to compact student models. However, traditional knowledge distillation methods treat all teacher predictions equally, regardless of the teacher's confidence in those predictions. This paper proposes an uncertainty-aware dual-student knowledge distillation framework that leverages teacher prediction uncertainty to selectively guide student learning. We introduce a peer-learning mechanism where two heterogeneous student architectures, specifically ResNet-18 and MobileNetV2, learn collaboratively from both the teacher network and each other. Experimental results on ImageNet-100 demonstrate that our approach achieves superior performance compared to baseline knowledge distillation methods, with ResNet-18 achieving 83.84\% top-1 accuracy and MobileNetV2 achieving 81.46\% top-1 accuracy, representing improvements of 2.04\% and 0.92\% respectively over traditional single-student distillation approaches.

Related papers

Distilling Calibrated Student from an Uncalibrated Teacher [8.101116303448586]
We study how to obtain a student from an uncalibrated teacher. Our approach relies on the fusion of data-augmentation techniques, including but not limited to cutout, mixup, and CutMix. We extend our approach beyond traditional knowledge distillation and find it suitable as well.
arXiv Detail & Related papers (2023-02-22T16:18:38Z)
Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student. At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL) Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z)
On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness. We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z)
Estimating and Maximizing Mutual Information for Knowledge Distillation [24.254198219979667]
We propose Mutual Information Maximization Knowledge Distillation (MIMKD) Our method uses a contrastive objective to simultaneously estimate and maximize a lower bound on the mutual information of local and global feature representations between a teacher and a student network. This can be used to improve the performance of low capacity models by transferring knowledge from more performant but computationally expensive models.
arXiv Detail & Related papers (2021-10-29T17:49:56Z)
Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation [20.17325172100031]
We propose a novel student-helping-teacher formula, Teacher Evolution via Self-Knowledge Distillation (TESKD), where the target teacher is learned with the help of multiple hierarchical students by sharing the structural backbone. The effectiveness of our proposed framework is demonstrated by extensive experiments with various network settings on two standard benchmarks including CIFAR-100 and ImageNet.
arXiv Detail & Related papers (2021-10-01T11:46:12Z)
Multi-granularity for knowledge distillation [3.3970049571884204]
Students have different abilities to understand the knowledge imparted by teachers. A multi-granularity self-analyzing module of the teacher network is designed. A stable excitation scheme is proposed for robust supervision for the student training.
arXiv Detail & Related papers (2021-08-15T07:47:08Z)
Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student. Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z)
Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z)
Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup [91.1317510066954]
We study a little-explored but important question, i.e., knowledge distillation efficiency. Our goal is to achieve a performance comparable to conventional knowledge distillation with a lower computation cost during training. We show that the UNcertainty-aware mIXup (UNIX) can serve as a clean yet effective solution.
arXiv Detail & Related papers (2020-12-17T06:52:16Z)
Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network. We show that the seemingly different self-supervision task can serve as a simple yet powerful solution. By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.