Reducing Capacity Gap in Knowledge Distillation with Review Mechanism
for Crowd Counting
- URL: http://arxiv.org/abs/2206.05475v1
- Date: Sat, 11 Jun 2022 09:11:42 GMT
- Title: Reducing Capacity Gap in Knowledge Distillation with Review Mechanism
for Crowd Counting
- Authors: Yunxin Liu, Qiaosi Yi, Jinshan Zeng
- Abstract summary: This paper introduces a novel review mechanism following KD models, motivated by the review mechanism of human-beings during the study.
The effectiveness of ReviewKD is demonstrated by a set of experiments over six benchmark datasets.
We also show that the suggested review mechanism can be used as a plug-and-play module to further boost the performance of a kind of heavy crowd counting models.
- Score: 16.65360204274379
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The lightweight crowd counting models, in particular knowledge distillation
(KD) based models, have attracted rising attention in recent years due to their
superiority on computational efficiency and hardware requirement. However,
existing KD based models usually suffer from the capacity gap issue, resulting
in the performance of the student network being limited by the teacher network.
In this paper, we address this issue by introducing a novel review mechanism
following KD models, motivated by the review mechanism of human-beings during
the study. Thus, the proposed model is dubbed ReviewKD. The proposed model
consists of an instruction phase and a review phase, where we firstly exploit a
well-trained heavy teacher network to transfer its latent feature to a
lightweight student network in the instruction phase, then in the review phase
yield a refined estimate of the density map based on the learned feature
through a review mechanism. The effectiveness of ReviewKD is demonstrated by a
set of experiments over six benchmark datasets via comparing to the
state-of-the-art models. Numerical results show that ReviewKD outperforms
existing lightweight models for crowd counting, and can effectively alleviate
the capacity gap issue, and particularly has the performance beyond the teacher
network. Besides the lightweight models, we also show that the suggested review
mechanism can be used as a plug-and-play module to further boost the
performance of a kind of heavy crowd counting models without modifying the
neural network architecture and introducing any additional model parameter.
Related papers
- Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - DistiLLM: Towards Streamlined Distillation for Large Language Models [53.46759297929675]
DistiLLM is a more effective and efficient KD framework for auto-regressive language models.
DisiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs.
arXiv Detail & Related papers (2024-02-06T11:10:35Z) - Comparative Knowledge Distillation [102.35425896967791]
Traditional Knowledge Distillation (KD) assumes readily available access to teacher models for frequent inference.
We propose Comparative Knowledge Distillation (CKD), which encourages student models to understand the nuanced differences in a teacher model's interpretations of samples.
CKD consistently outperforms state of the art data augmentation and KD techniques.
arXiv Detail & Related papers (2023-11-03T21:55:33Z) - Knowledge Distillation Performs Partial Variance Reduction [93.6365393721122]
Knowledge distillation is a popular approach for enhancing the performance of ''student'' models.
The underlying mechanics behind knowledge distillation (KD) are still not fully understood.
We show that KD can be interpreted as a novel type of variance reduction mechanism.
arXiv Detail & Related papers (2023-05-27T21:25:55Z) - Knowledge Distillation with Representative Teacher Keys Based on
Attention Mechanism for Image Classification Model Compression [1.503974529275767]
knowledge distillation (KD) has been recognized as one of the effective method of model compression to decrease the model parameters.
Inspired by attention mechanism, we propose a novel KD method called representative teacher key (RTK)
Our proposed RTK can effectively improve the classification accuracy of the state-of-the-art attention-based KD method.
arXiv Detail & Related papers (2022-06-26T05:08:50Z) - How and When Adversarial Robustness Transfers in Knowledge Distillation? [137.11016173468457]
This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in Knowledge distillation (KD)
We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy.
Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model.
arXiv Detail & Related papers (2021-10-22T21:30:53Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.