Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized
Deep Neural Networks
- URL: http://arxiv.org/abs/2009.14502v1
- Date: Wed, 30 Sep 2020 08:38:37 GMT
- Title: Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized
Deep Neural Networks
- Authors: Yoonho Boo, Sungho Shin, Jungwook Choi, and Wonyong Sung
- Abstract summary: quantization of deep neural networks (QDNNs) has been actively studied for deployment in edge devices.
Recent studies employ the knowledge distillation (KD) method to improve the performance of quantized networks.
In this study, we propose ensemble training for QDNNs (SPEQ)
- Score: 27.533162215182422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The quantization of deep neural networks (QDNNs) has been actively studied
for deployment in edge devices. Recent studies employ the knowledge
distillation (KD) method to improve the performance of quantized networks. In
this study, we propose stochastic precision ensemble training for QDNNs (SPEQ).
SPEQ is a knowledge distillation training scheme; however, the teacher is
formed by sharing the model parameters of the student network. We obtain the
soft labels of the teacher by changing the bit precision of the activation
stochastically at each layer of the forward-pass computation. The student model
is trained with these soft labels to reduce the activation quantization noise.
The cosine similarity loss is employed, instead of the KL-divergence, for KD
training. As the teacher model changes continuously by random bit-precision
assignment, it exploits the effect of stochastic ensemble KD. SPEQ outperforms
the existing quantization training methods in various tasks, such as image
classification, question-answering, and transfer learning without the need for
cumbersome teacher networks.
Related papers
- AICSD: Adaptive Inter-Class Similarity Distillation for Semantic
Segmentation [12.92102548320001]
This paper proposes a novel method called Inter-Class Similarity Distillation (ICSD) for the purpose of knowledge distillation.
The proposed method transfers high-order relations from the teacher network to the student network by independently computing intra-class distributions for each class from network outputs.
Experiments conducted on two well-known datasets for semantic segmentation, Cityscapes and Pascal VOC 2012, validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-08-08T13:17:20Z) - IF2Net: Innately Forgetting-Free Networks for Continual Learning [49.57495829364827]
Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge.
Motivated by the characteristics of neural networks, we investigated how to design an Innately Forgetting-Free Network (IF2Net)
IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time.
arXiv Detail & Related papers (2023-06-18T05:26:49Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Quantization-aware Interval Bound Propagation for Training Certifiably
Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs)
Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization.
We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z) - Training Quantised Neural Networks with STE Variants: the Additive Noise
Annealing Algorithm [16.340620299847384]
Training quantised neural networks (QNNs) is a non-differentiable problem since weights and features are output by piecewise constant functions.
The standard solution is to apply the straight-through estimator (STE), using different functions during the inference and computation steps.
Several STE variants have been proposed in the literature aiming to maximise the task accuracy of the trained network.
arXiv Detail & Related papers (2022-03-21T20:14:27Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Data-Free Knowledge Distillation with Soft Targeted Transfer Set
Synthesis [8.87104231451079]
Knowledge distillation (KD) has proved to be an effective approach for deep neural network compression.
In traditional KD, the transferred knowledge is usually obtained by feeding training samples to the teacher network.
The original training dataset is not always available due to storage costs or privacy issues.
We propose a novel data-free KD approach by modeling the intermediate feature space of the teacher.
arXiv Detail & Related papers (2021-04-10T22:42:14Z) - Embedded Knowledge Distillation in Depth-level Dynamic Neural Network [8.207403859762044]
We propose an elegant Depth-level Dynamic Neural Network (DDNN) integrated different-depth sub-nets of similar architectures.
In this article, we design the Embedded-Knowledge-Distillation (EKD) training mechanism for the DDNN to implement semantic knowledge transfer from the teacher (full) net to multiple sub-nets.
Experiments on CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that sub-nets in DDNN with EKD training achieves better performance than the depth-level pruning or individually training.
arXiv Detail & Related papers (2021-03-01T06:35:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.