Switchable Precision Neural Networks
- URL: http://arxiv.org/abs/2002.02815v1
- Date: Fri, 7 Feb 2020 14:43:44 GMT
- Title: Switchable Precision Neural Networks
- Authors: Luis Guerra, Bohan Zhuang, Ian Reid, Tom Drummond
- Abstract summary: Switchable Precision neural Networks (SP-Nets) are proposed to train a shared network capable of operating at multiple quantization levels.
At runtime, the network can adjust its precision on the fly according to instant memory, latency, power consumption and accuracy demands.
- Score: 35.2752928147013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instantaneous and on demand accuracy-efficiency trade-off has been recently
explored in the context of neural networks slimming. In this paper, we propose
a flexible quantization strategy, termed Switchable Precision neural Networks
(SP-Nets), to train a shared network capable of operating at multiple
quantization levels. At runtime, the network can adjust its precision on the
fly according to instant memory, latency, power consumption and accuracy
demands. For example, by constraining the network weights to 1-bit with
switchable precision activations, our shared network spans from BinaryConnect
to Binarized Neural Network, allowing to perform dot-products using only
summations or bit operations. In addition, a self-distillation scheme is
proposed to increase the performance of the quantized switches. We tested our
approach with three different quantizers and demonstrate the performance of
SP-Nets against independently trained quantized models in classification
accuracy for Tiny ImageNet and ImageNet datasets using ResNet-18 and MobileNet
architectures.
Related papers
- Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators [0.20971479389679332]
Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors.
We show that enabling rich mixed quantization schemes during the implementation can open a previously hidden space of mappings.
CNNs utilizing quantized weights and activations and suitable mappings can significantly improve trade-offs among the accuracy, energy, and memory requirements.
arXiv Detail & Related papers (2024-04-08T10:10:30Z) - Bayesian Inference Accelerator for Spiking Neural Networks [3.145754107337963]
spiking neural networks (SNNs) have the potential to reduce computational area and power.
In this work, we demonstrate an optimization framework for developing and implementing efficient Bayesian SNNs in hardware.
We demonstrate accuracies comparable to Bayesian binary networks with full-precision Bernoulli parameters, while requiring up to $25times$ less spikes.
arXiv Detail & Related papers (2024-01-27T16:27:19Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - Standard Deviation-Based Quantization for Deep Neural Networks [17.495852096822894]
Quantization of deep neural networks is a promising approach that reduces the inference cost.
We propose a new framework to learn the quantization intervals (discrete values) using the knowledge of the network's weight and activation distributions.
Our scheme simultaneously prunes the network's parameters and allows us to flexibly adjust the pruning ratio during the quantization process.
arXiv Detail & Related papers (2022-02-24T23:33:47Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Scalable Verification of Quantized Neural Networks (Technical Report) [14.04927063847749]
We show that bit-exact implementation of quantized neural networks with bit-vector specifications is PSPACE-hard.
We propose three techniques for making SMT-based verification of quantized neural networks more scalable.
arXiv Detail & Related papers (2020-12-15T10:05:37Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z) - ReActNet: Towards Precise Binary Neural Network with Generalized
Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost.
We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts.
We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.