Towards a Smaller Student: Capacity Dynamic Distillation for Efficient
Image Retrieval
- URL: http://arxiv.org/abs/2303.09230v2
- Date: Wed, 31 May 2023 15:32:48 GMT
- Title: Towards a Smaller Student: Capacity Dynamic Distillation for Efficient
Image Retrieval
- Authors: Yi Xie, Huaidong Zhang, Xuemiao Xu, Jianqing Zhu, Shengfeng He
- Abstract summary: Previous Knowledge Distillation based efficient image retrieval methods employs a lightweight network as the student model for fast inference.
We propose a Capacity Dynamic Distillation framework, which constructs a student model with editable representation capacity.
Our method has superior inference speed and accuracy, e.g., on the VeRi-776 dataset, given the ResNet101 as a teacher.
- Score: 49.01637233471453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous Knowledge Distillation based efficient image retrieval methods
employs a lightweight network as the student model for fast inference. However,
the lightweight student model lacks adequate representation capacity for
effective knowledge imitation during the most critical early training period,
causing final performance degeneration. To tackle this issue, we propose a
Capacity Dynamic Distillation framework, which constructs a student model with
editable representation capacity. Specifically, the employed student model is
initially a heavy model to fruitfully learn distilled knowledge in the early
training epochs, and the student model is gradually compressed during the
training. To dynamically adjust the model capacity, our dynamic framework
inserts a learnable convolutional layer within each residual block in the
student model as the channel importance indicator. The indicator is optimized
simultaneously by the image retrieval loss and the compression loss, and a
retrieval-guided gradient resetting mechanism is proposed to release the
gradient conflict. Extensive experiments show that our method has superior
inference speed and accuracy, e.g., on the VeRi-776 dataset, given the
ResNet101 as a teacher, our method saves 67.13% model parameters and 65.67%
FLOPs (around 24.13% and 21.94% higher than state-of-the-arts) without
sacrificing accuracy (around 2.11% mAP higher than state-of-the-arts).
Related papers
- Efficient Point Cloud Classification via Offline Distillation Framework and Negative-Weight Self-Distillation Technique [46.266960248570086]
We introduce an innovative offline recording strategy that avoids the simultaneous loading of both teacher and student models.
This approach feeds a multitude of augmented samples into the teacher model, recording both the data augmentation parameters and the corresponding logit outputs.
Experimental results demonstrate that the proposed distillation strategy enables the student model to achieve performance comparable to state-of-the-art models.
arXiv Detail & Related papers (2024-09-03T16:12:12Z) - One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts.
Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation.
We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Rich Feature Distillation with Feature Affinity Module for Efficient
Image Dehazing [1.1470070927586016]
This work introduces a simple, lightweight, and efficient framework for single-image haze removal.
We exploit rich "dark-knowledge" information from a lightweight pre-trained super-resolution model via the notion of heterogeneous knowledge distillation.
Our experiments are carried out on the RESIDE-Standard dataset to demonstrate the robustness of our framework to the synthetic and real-world domains.
arXiv Detail & Related papers (2022-07-13T18:32:44Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Self-Feature Regularization: Self-Feature Distillation Without Teacher
Models [0.0]
Self-Feature Regularization(SFR) is proposed, which uses features in the deep layers to supervise feature learning in the shallow layers.
We firstly use generalization-l2 loss to match local features and a many-to-one approach to distill more intensively in the channel dimension.
arXiv Detail & Related papers (2021-03-12T15:29:00Z) - Beyond Self-Supervision: A Simple Yet Effective Network Distillation
Alternative to Improve Backbones [40.33419553042038]
We propose to improve existing baseline networks via knowledge distillation from off-the-shelf pre-trained big powerful models.
Our solution performs distillation by only driving prediction of the student model consistent with that of the teacher model.
We empirically find that such simple distillation settings perform extremely effective, for example, the top-1 accuracy on ImageNet-1k validation set of MobileNetV3-large and ResNet50-D can be significantly improved.
arXiv Detail & Related papers (2021-03-10T09:32:44Z) - Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression.
Current methods assign a fixed weight to a teacher model in the whole distillation.
Most of the existing methods allocate an equal weight to every teacher model.
In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.