Uncertainty-Preserving QBNNs: Multi-Level Quantization of SVI-Based Bayesian Neural Networks for Image Classification
- URL: http://arxiv.org/abs/2512.10602v1
- Date: Thu, 11 Dec 2025 12:51:42 GMT
- Title: Uncertainty-Preserving QBNNs: Multi-Level Quantization of SVI-Based Bayesian Neural Networks for Image Classification
- Authors: Hendrik Borras, Yong Wu, Bernhard Klein, Holger Fröning,
- Abstract summary: We introduce a systematic multi-level quantization framework for Variational Inference based BNNs.<n>We demonstrate that BNNs can be quantized down to 4-bit precision while maintaining both classification accuracy and uncertainty disentanglement.
- Score: 9.160739594332036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian Neural Networks (BNNs) provide principled uncertainty quantification but suffer from substantial computational and memory overhead compared to deterministic networks. While quantization techniques have successfully reduced resource requirements in standard deep learning models, their application to probabilistic models remains largely unexplored. We introduce a systematic multi-level quantization framework for Stochastic Variational Inference based BNNs that distinguishes between three quantization strategies: Variational Parameter Quantization (VPQ), Sampled Parameter Quantization (SPQ), and Joint Quantization (JQ). Our logarithmic quantization for variance parameters, and specialized activation functions to preserve the distributional structure are essential for calibrated uncertainty estimation. Through comprehensive experiments on Dirty-MNIST, we demonstrate that BNNs can be quantized down to 4-bit precision while maintaining both classification accuracy and uncertainty disentanglement. At 4 bits, Joint Quantization achieves up to 8x memory reduction compared to floating-point implementations with minimal degradation in epistemic and aleatoric uncertainty estimation. These results enable deployment of BNNs on resource-constrained edge devices and provide design guidelines for future analog "Bayesian Machines" operating at inherently low precision.
Related papers
- Starting Positions Matter: A Study on Better Weight Initialization for Neural Network Quantization [71.44469196328507]
Quantization-specific model development techniques such as regularization, quantization-aware training, and quantization-robustness penalties have served to greatly boost the accuracy and robustness of modern DNNs.<n>We present an extensive study examining the effects of different weight initializations on a variety of CNN building blocks commonly used in efficient CNNs.<n>Next, we explore a new method for quantization-robust CNN initialization -- using Graph Hypernetworks (GHN) to predict parameters of quantized DNNs.
arXiv Detail & Related papers (2025-06-12T08:11:34Z) - Mitigating Barren Plateaus in Quantum Neural Networks via an AI-Driven Submartingale-Based Framework [3.0617189749929348]
We propose AdaInit to mitigate barren plateaus (BPs) in quantum neural networks (QNNs)<n>AdaInit iteratively synthesizes initial parameters for QNNs that yield non-negligible gradient variance, thereby mitigating BPs.<n>We provide rigorous theoretical analyses of the submartingale-based process and empirically validate that AdaInit consistently outperforms existing methods in maintaining higher gradient variance across various QNN scales.
arXiv Detail & Related papers (2025-02-17T05:57:15Z) - Positional Encoder Graph Quantile Neural Networks for Geographic Data [4.277516034244117]
We propose a novel framework that combines PE-GNNs with Quantile Neural Networks, partially monotonic neural blocks, and post-hoc recalibration techniques.<n>The PE-GQNN enables flexible and robust conditional density estimation with minimal assumptions about the target distribution, and it extends naturally to tasks beyond spatial data.
arXiv Detail & Related papers (2024-09-27T16:02:12Z) - Variational Neural Networks [88.24021148516319]
We propose a method for uncertainty estimation in neural networks called Variational Neural Network (VNN)
VNN generates parameters for the output distribution of a layer by transforming its inputs with learnable sub-layers.
In uncertainty quality estimation experiments, we show that VNNs achieve better uncertainty quality than Monte Carlo Dropout or Bayes By Backpropagation methods.
arXiv Detail & Related papers (2022-07-04T15:41:02Z) - Edge Inference with Fully Differentiable Quantized Mixed Precision
Neural Networks [1.131071436917293]
Quantizing parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference.
This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing.
arXiv Detail & Related papers (2022-06-15T18:11:37Z) - A Comprehensive Survey on Model Quantization for Deep Neural Networks in
Image Classification [0.0]
A promising approach is quantization, in which the full-precision values are stored in low bit-width precision.
We present a comprehensive survey of quantization concepts and methods, with a focus on image classification.
We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization.
arXiv Detail & Related papers (2022-05-14T15:08:32Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Q-SpiNN: A Framework for Quantizing Spiking Neural Networks [14.727296040550392]
A prominent technique for reducing the memory footprint of Spiking Neural Networks (SNNs) without decreasing the accuracy significantly is quantization.
We propose Q-SpiNN, a novel quantization framework for memory-efficient SNNs.
For the unsupervised network, the Q-SpiNN reduces the memory footprint by ca. 4x, while maintaining the accuracy within 1% from the baseline on the MNIST dataset.
For the supervised network, the Q-SpiNN reduces the memory by ca. 2x, while keeping the accuracy within 2% from the baseline on the DVS-Gesture dataset
arXiv Detail & Related papers (2021-07-05T06:01:15Z) - A Statistical Framework for Low-bitwidth Training of Deep Neural
Networks [70.77754244060384]
Fully quantized training (FQT) uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model.
One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties.
arXiv Detail & Related papers (2020-10-27T13:57:33Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z) - Optimal Quantization for Batch Normalization in Neural Network
Deployments and Beyond [18.14282813812512]
Batch Normalization (BN) poses a challenge for Quantized Neural Networks (QNNs)
We propose a novel method to quantize BN by converting an affine transformation of two floating points to a fixed-point operation with shared quantized scale.
Our method is verified by experiments at layer level on CIFAR and ImageNet datasets.
arXiv Detail & Related papers (2020-08-30T09:33:29Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.