All at Once Network Quantization via Collaborative Knowledge Transfer
- URL: http://arxiv.org/abs/2103.01435v1
- Date: Tue, 2 Mar 2021 03:09:03 GMT
- Title: All at Once Network Quantization via Collaborative Knowledge Transfer
- Authors: Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan
Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko
- Abstract summary: We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network.
Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student.
To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
- Score: 56.95849086170461
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Network quantization has rapidly become one of the most widely used methods
to compress and accelerate deep neural networks on edge devices. While existing
approaches offer impressive results on common benchmark datasets, they
generally repeat the quantization process and retrain the low-precision network
from scratch, leading to different networks tailored for different resource
constraints. This limits scalable deployment of deep networks in many
real-world applications, where in practice dynamic changes in bit-width are
often desired. All at Once quantization addresses this problem, by flexibly
adjusting the bit-width of a single deep network during inference, without
requiring re-training or additional memory to store separate models, for
instant adaptation in different scenarios. In this paper, we develop a novel
collaborative knowledge transfer approach for efficiently training the
all-at-once quantization network. Specifically, we propose an adaptive
selection strategy to choose a high-precision \enquote{teacher} for
transferring knowledge to the low-precision student while jointly optimizing
the model with all bit-widths. Furthermore, to effectively transfer knowledge,
we develop a dynamic block swapping method by randomly replacing the blocks in
the lower-precision student network with the corresponding blocks in the
higher-precision teacher network. Extensive experiments on several challenging
and diverse datasets for both image and video classification well demonstrate
the efficacy of our proposed approach over state-of-the-art methods.
Related papers
- Transfer Learning with Reconstruction Loss [12.906500431427716]
This paper proposes a novel approach for model training by adding into the model an additional reconstruction stage associated with a new reconstruction loss.
The proposed approach encourages the learned features to be general and transferable, and therefore can be readily used for efficient transfer learning.
For numerical simulations, three applications are studied: transfer learning on classifying MNIST handwritten digits, the device-to-device wireless network power allocation, and the multiple-input-single-output network downlink beamforming and localization.
arXiv Detail & Related papers (2024-03-31T00:22:36Z) - Fast and Scalable Network Slicing by Integrating Deep Learning with
Lagrangian Methods [8.72339110741777]
Network slicing is a key technique in 5G and beyond for efficiently supporting diverse services.
Deep learning models suffer limited generalization and adaptability to dynamic slicing configurations.
We propose a novel framework that integrates constrained optimization methods and deep learning models.
arXiv Detail & Related papers (2024-01-22T07:19:16Z) - Multilevel-in-Layer Training for Deep Neural Network Regression [1.6185544531149159]
We present a multilevel regularization strategy that constructs and trains a hierarchy of neural networks.
We experimentally show with PDE regression problems that our multilevel training approach is an effective regularizer.
arXiv Detail & Related papers (2022-11-11T23:53:46Z) - Intrinisic Gradient Compression for Federated Learning [3.9215337270154995]
Federated learning enables a large number of clients to jointly train a machine learning model on privately-held data.
One of the largest barriers to wider adoption of federated learning is the communication cost of sending model updates from and to the clients.
arXiv Detail & Related papers (2021-12-05T19:16:54Z) - SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and
Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task.
The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders.
To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Adaptive Quantization of Model Updates for Communication-Efficient
Federated Learning [75.45968495410047]
Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning.
Gradient quantization is an effective way of reducing the number of bits required to communicate each model update.
We propose an adaptive quantization strategy called AdaFL that aims to achieve communication efficiency as well as a low error floor.
arXiv Detail & Related papers (2021-02-08T19:14:21Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data.
We propose to speed up this process by exploiting subsets of training data at each incremental training step.
Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.