Neural Network Quantization for Efficient Inference: A Survey
- URL: http://arxiv.org/abs/2112.06126v1
- Date: Wed, 8 Dec 2021 22:49:39 GMT
- Title: Neural Network Quantization for Efficient Inference: A Survey
- Authors: Olivia Weng
- Abstract summary: Neural network quantization has recently arisen to meet this demand of reducing the size and complexity of neural networks.
This paper surveys the many neural network quantization techniques that have been developed in the last decade.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As neural networks have become more powerful, there has been a rising desire
to deploy them in the real world; however, the power and accuracy of neural
networks is largely due to their depth and complexity, making them difficult to
deploy, especially in resource-constrained devices. Neural network quantization
has recently arisen to meet this demand of reducing the size and complexity of
neural networks by reducing the precision of a network. With smaller and
simpler networks, it becomes possible to run neural networks within the
constraints of their target hardware. This paper surveys the many neural
network quantization techniques that have been developed in the last decade.
Based on this survey and comparison of neural network quantization techniques,
we propose future directions of research in the area.
Related papers
- CTRQNets & LQNets: Continuous Time Recurrent and Liquid Quantum Neural Networks [76.53016529061821]
Liquid Quantum Neural Network (LQNet) and Continuous Time Recurrent Quantum Neural Network (CTRQNet) developed.
LQNet and CTRQNet achieve accuracy increases as high as 40% on CIFAR 10 through binary classification.
arXiv Detail & Related papers (2024-08-28T00:56:03Z) - Verified Neural Compressed Sensing [58.98637799432153]
We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task.
We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements.
We show that the complexity of the network can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.
arXiv Detail & Related papers (2024-05-07T12:20:12Z) - Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation [36.41451383422967]
In scientific applications, the scale of neural networks is generally moderate-size, mainly to ensure the speed of inference.
Existing work has found that the powerful capabilities of neural networks are primarily due to their non-linearity.
We propose a condensation reduction algorithm to verify the feasibility of this idea in practical problems.
arXiv Detail & Related papers (2024-05-02T06:53:40Z) - Message Passing Variational Autoregressive Network for Solving Intractable Ising Models [6.261096199903392]
Many deep neural networks have been used to solve Ising models, including autoregressive neural networks, convolutional neural networks, recurrent neural networks, and graph neural networks.
Here we propose a variational autoregressive architecture with a message passing mechanism, which can effectively utilize the interactions between spin variables.
The new network trained under an annealing framework outperforms existing methods in solving several prototypical Ising spin Hamiltonians, especially for larger spin systems at low temperatures.
arXiv Detail & Related papers (2024-04-09T11:27:07Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Neural Network Pruning as Spectrum Preserving Process [7.386663473785839]
We identify the close connection between matrix spectrum learning and neural network training for dense and convolutional layers.
We propose a matrix sparsification algorithm tailored for neural network pruning that yields better pruning result.
arXiv Detail & Related papers (2023-07-18T05:39:32Z) - A Faster Approach to Spiking Deep Convolutional Neural Networks [0.0]
Spiking neural networks (SNNs) have closer dynamics to the brain than current deep neural networks.
We propose a network structure based on previous work to improve network runtime and accuracy.
arXiv Detail & Related papers (2022-10-31T16:13:15Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Building Compact and Robust Deep Neural Networks with Toeplitz Matrices [93.05076144491146]
This thesis focuses on the problem of training neural networks which are compact, easy to train, reliable and robust to adversarial examples.
We leverage the properties of structured matrices from the Toeplitz family to build compact and secure neural networks.
arXiv Detail & Related papers (2021-09-02T13:58:12Z) - Binary Neural Network for Speaker Verification [13.472791713805762]
This paper focuses on how to apply binary neural networks to the task of speaker verification.
Experiment results show that, after binarizing the Convolutional Neural Network, the ResNet34-based network achieves an EER of around 5%.
arXiv Detail & Related papers (2021-04-06T06:04:57Z) - Binary Neural Networks: A Survey [126.67799882857656]
The binary neural network serves as a promising technique for deploying deep models on resource-limited devices.
The binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network.
We present a survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error.
arXiv Detail & Related papers (2020-03-31T16:47:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.