Neural Network Activation Quantization with Bitwise Information
Bottlenecks
- URL: http://arxiv.org/abs/2006.05210v1
- Date: Tue, 9 Jun 2020 12:10:04 GMT
- Title: Neural Network Activation Quantization with Bitwise Information
Bottlenecks
- Authors: Xichuan Zhou, Kui Liu, Cong Shi, Haijun Liu, Ji Liu
- Abstract summary: This paper presents a Bitwise Information Bottleneck approach for quantizing and encoding neural network activations.
By minimizing the quantization rate-distortion of each layer, the neural network with information bottlenecks achieves the state-of-the-art accuracy with low-precision activation.
- Score: 25.319181120172562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent researches on information bottleneck shed new light on the continuous
attempts to open the black box of neural signal encoding. Inspired by the
problem of lossy signal compression for wireless communication, this paper
presents a Bitwise Information Bottleneck approach for quantizing and encoding
neural network activations. Based on the rate-distortion theory, the Bitwise
Information Bottleneck attempts to determine the most significant bits in
activation representation by assigning and approximating the sparse coefficient
associated with each bit. Given the constraint of a limited average code rate,
the information bottleneck minimizes the rate-distortion for optimal activation
quantization in a flexible layer-by-layer manner. Experiments over ImageNet and
other datasets show that, by minimizing the quantization rate-distortion of
each layer, the neural network with information bottlenecks achieves the
state-of-the-art accuracy with low-precision activation. Meanwhile, by reducing
the code rate, the proposed method can improve the memory and computational
efficiency by over six times compared with the deep neural network with
standard single-precision representation. Codes will be available on GitHub
when the paper is accepted \url{https://github.com/BitBottleneck/PublicCode}.
Related papers
- Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one.
Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z) - Compression with Bayesian Implicit Neural Representations [16.593537431810237]
We propose overfitting variational neural networks to the data and compressing an approximate posterior weight sample using relative entropy coding instead of quantizing and entropy coding it.
Experiments show that our method achieves strong performance on image and audio compression while retaining simplicity.
arXiv Detail & Related papers (2023-05-30T16:29:52Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - ZippyPoint: Fast Interest Point Detection, Description, and Matching
through Mixed Precision Discretization [71.91942002659795]
We investigate and adapt network quantization techniques to accelerate inference and enable its use on compute limited platforms.
ZippyPoint, our efficient quantized network with binary descriptors, improves the network runtime speed, the descriptor matching speed, and the 3D model size.
These improvements come at a minor performance degradation as evaluated on the tasks of homography estimation, visual localization, and map-free visual relocalization.
arXiv Detail & Related papers (2022-03-07T18:59:03Z) - SignalNet: A Low Resolution Sinusoid Decomposition and Estimation
Network [79.04274563889548]
We propose SignalNet, a neural network architecture that detects the number of sinusoids and estimates their parameters from quantized in-phase and quadrature samples.
We introduce a worst-case learning threshold for comparing the results of our network relative to the underlying data distributions.
In simulation, we find that our algorithm is always able to surpass the threshold for three-bit data but often cannot exceed the threshold for one-bit data.
arXiv Detail & Related papers (2021-06-10T04:21:20Z) - Data-free mixed-precision quantization using novel sensitivity metric [6.031526641614695]
We propose a novel sensitivity metric that considers the effect of quantization error on task loss and interaction with other layers.
Our experiments show that the proposed metric better represents quantization sensitivity, and generated data are more feasible to be applied to mixed-precision quantization.
arXiv Detail & Related papers (2021-03-18T07:23:21Z) - Efficient bit encoding of neural networks for Fock states [77.34726150561087]
The complexity of the neural network scales only with the number of bit-encoded neurons rather than the maximum boson number.
In the high occupation regime its information compression efficiency is shown to surpass even maximally optimized density matrix implementations.
arXiv Detail & Related papers (2021-03-15T11:24:40Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - Mixed-Precision Quantized Neural Network with Progressively Decreasing
Bitwidth For Image Classification and Object Detection [21.48875255723581]
A mixed-precision quantized neural network with progressively ecreasing bitwidth is proposed to improve the trade-off between accuracy and compression.
Experiments on typical network architectures and benchmark datasets demonstrate that the proposed method could achieve better or comparable results.
arXiv Detail & Related papers (2019-12-29T14:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.