Related papers: Batch Normalization-Free Fully Integer Quantized Neural Networks via Progressive Tandem Learning

Batch Normalization-Free Fully Integer Quantized Neural Networks via Progressive Tandem Learning

URL: http://arxiv.org/abs/2512.16476v1
Date: Thu, 18 Dec 2025 12:47:18 GMT
Title: Batch Normalization-Free Fully Integer Quantized Neural Networks via Progressive Tandem Learning
Authors: Pengfei Sun, Wenyu Jiang, Piew Yoong Chee, Paul Devos, Dick Botteldooren,
Abstract summary: Quantised neural networks (QNNs) shrink models and reduce inference energy through low-bit arithmetic.<n>We present a BN-free, fully integer QNN trained via a progressive, layer-wise distillation scheme.<n>On ImageNet with AlexNet, the BN-free model attains competitive Top-1 accuracy under aggressive quantisation.
Score: 16.532309126474843
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantised neural networks (QNNs) shrink models and reduce inference energy through low-bit arithmetic, yet most still depend on a running statistics batch normalisation (BN) layer, preventing true integer-only deployment. Prior attempts remove BN by parameter folding or tailored initialisation; while helpful, they rarely recover BN's stability and accuracy and often impose bespoke constraints. We present a BN-free, fully integer QNN trained via a progressive, layer-wise distillation scheme that slots into existing low-bit pipelines. Starting from a pretrained BN-enabled teacher, we use layer-wise targets and progressive compensation to train a student that performs inference exclusively with integer arithmetic and contains no BN operations. On ImageNet with AlexNet, the BN-free model attains competitive Top-1 accuracy under aggressive quantisation. The procedure integrates directly with standard quantisation workflows, enabling end-to-end integer-only inference for resource-constrained settings such as edge and embedded devices.

Related papers

Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation [67.77048565738728]
Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately. We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions. Our approach achieves significant performance gains across a wide range of benchmarks.
arXiv Detail & Related papers (2023-10-13T04:50:40Z)
Batch Normalization Preconditioning for Neural Network Training [7.709342743709842]
Batch normalization (BN) is a popular and ubiquitous method in deep learning. BN is not suitable for use with very small mini-batch sizes or online learning. We propose a new method called Batch Normalization Preconditioning (BNP)
arXiv Detail & Related papers (2021-08-02T18:17:26Z)
Towards Efficient Full 8-bit Integer DNN Online Training on Resource-limited Devices without Batch Normalization [13.340254606150232]
Huge computational costs brought by convolution and batch normalization (BN) have caused great challenges for the online training and corresponding applications of deep neural networks (DNNs) Existing works only focus on the convolution or BN acceleration and no solution can alleviate both problems with satisfactory performance. Online training has gradually become a trend in resource-limited devices like mobile phones while there is still no complete technical scheme with acceptable model performance, processing speed, and computational cost.
arXiv Detail & Related papers (2021-05-27T14:58:04Z)
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization [92.23297927690149]
Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN) We extend their framework to training BNNs, and for the first time demonstrate that BNs can be completed removed from BNN training and inference regimes.
arXiv Detail & Related papers (2021-04-16T16:46:57Z)
MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training. We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer. MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z)
Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs) We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics. Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z)
PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN) LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks. We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z)
Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method. We show that MABN can completely restore the performance of vanilla BN in small batch cases. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.