Teacher-Guided One-Shot Pruning via Context-Aware Knowledge Distillation
- URL: http://arxiv.org/abs/2511.16653v1
- Date: Thu, 20 Nov 2025 18:56:05 GMT
- Title: Teacher-Guided One-Shot Pruning via Context-Aware Knowledge Distillation
- Authors: Md. Samiul Alim, Sharjil Khan, Amrijit Biswas, Fuad Rahman, Shafin Rahman, Nabeel Mohammed,
- Abstract summary: Unstructured pruning remains a powerful strategy for compressing deep neural networks.<n>We introduce a novel teacher-guided pruning framework that tightly integrates Knowledge Distillation (KD) with importance score estimation.<n>Our method facilitates a one-shot global pruning strategy that efficiently eliminates redundant weights while preserving essential representations.
- Score: 7.870062030206608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unstructured pruning remains a powerful strategy for compressing deep neural networks, yet it often demands iterative train-prune-retrain cycles, resulting in significant computational overhead. To address this challenge, we introduce a novel teacher-guided pruning framework that tightly integrates Knowledge Distillation (KD) with importance score estimation. Unlike prior approaches that apply KD as a post-pruning recovery step, our method leverages gradient signals informed by the teacher during importance score calculation to identify and retain parameters most critical for both task performance and knowledge transfer. Our method facilitates a one-shot global pruning strategy that efficiently eliminates redundant weights while preserving essential representations. After pruning, we employ sparsity-aware retraining with and without KD to recover accuracy without reactivating pruned connections. Comprehensive experiments across multiple image classification benchmarks, including CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate that our method consistently achieves high sparsity levels with minimal performance degradation. Notably, our approach outperforms state-of-the-art baselines such as EPG and EPSD at high sparsity levels, while offering a more computationally efficient alternative to iterative pruning schemes like COLT. The proposed framework offers a computation-efficient, performance-preserving solution well suited for deployment in resource-constrained environments.
Related papers
- ERDE: Entropy-Regularized Distillation for Early-exit [1.3403105494381726]
Deep neural networks have demonstrated state-of-the-art performance in image classification with relatively high efficiency.<n>Deep neural networks exhibit high computational costs, often rendering them impractical for real-time and edge applications.<n>The proposed method integrates two well-established optimization techniques: early exits and knowledge distillation.
arXiv Detail & Related papers (2025-10-06T14:45:41Z) - CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs [53.749193998004166]
Curriculum learning plays a crucial role in enhancing the training efficiency of large language models.<n>We propose CurES, an efficient training method that accelerates convergence and employs Bayesian posterior estimation to minimize computational overhead.
arXiv Detail & Related papers (2025-10-01T15:41:27Z) - Local Dense Logit Relations for Enhanced Knowledge Distillation [12.350115738581223]
Local Logit Distillation captures inter-class relationships and recombining logit information.<n>We introduce an Adaptive Decay Weight (ADW) strategy, which can dynamically adjust the weights for critical category pairs.<n>Our method improves the student's performance by transferring fine-grained knowledge and emphasizing the most critical relationships.
arXiv Detail & Related papers (2025-07-21T16:25:38Z) - EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - Efficient Diffusion as Low Light Enhancer [63.789138528062225]
Reflectance-Aware Trajectory Refinement (RATR) is a simple yet effective module to refine the teacher trajectory using the reflectance component of images.
textbfReflectance-aware textbfDiffusion with textbfDistilled textbfTrajectory (textbfReDDiT) is an efficient and flexible distillation framework tailored for Low-Light Image Enhancement (LLIE)
arXiv Detail & Related papers (2024-10-16T08:07:18Z) - SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training [2.895034191799291]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph.<n>We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks.<n>Our results on CIFAR-10, CIFAR-100, and Tiny Imagenet suggest that our scheme can remove 50% of connections in deep networks with 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z) - Back to Basics: Efficient Network Compression via IMP [22.586474627159287]
Iterative Magnitude Pruning (IMP) is one of the most established approaches for network pruning.
IMP is often argued that it reaches suboptimal states by not incorporating sparsification into the training phase.
We find that IMP with SLR for retraining can outperform state-of-the-art pruning-during-training approaches.
arXiv Detail & Related papers (2021-11-01T11:23:44Z) - Pruning with Compensation: Efficient Channel Pruning for Deep
Convolutional Neural Networks [0.9712140341805068]
A highly efficient pruning method is proposed to significantly reduce the cost of pruning DCNN.
Our method shows competitive pruning performance among the state-of-the-art retraining-based pruning methods.
arXiv Detail & Related papers (2021-08-31T10:17:36Z) - Always Be Dreaming: A New Approach for Data-Free Class-Incremental
Learning [73.24988226158497]
We consider the high-impact problem of Data-Free Class-Incremental Learning (DFCIL)
We propose a novel incremental distillation strategy for DFCIL, contributing a modified cross-entropy training and importance-weighted feature distillation.
Our method results in up to a 25.1% increase in final task accuracy (absolute difference) compared to SOTA DFCIL methods for common class-incremental benchmarks.
arXiv Detail & Related papers (2021-06-17T17:56:08Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.