Layerwise Progressive Freezing Enables STE-Free Training of Deep Binary Neural Networks
- URL: http://arxiv.org/abs/2601.22660v1
- Date: Fri, 30 Jan 2026 07:26:10 GMT
- Title: Layerwise Progressive Freezing Enables STE-Free Training of Deep Binary Neural Networks
- Authors: Evan Gibson Smith, Bashima Islam,
- Abstract summary: We investigate progressive freezing as an alternative to straight-through estimators (STE) for training binary networks from scratch.<n>Under controlled training conditions, we find that while global progressive freezing works for binary-weight networks, it fails for full binary neural networks due to activation-induced blockades.<n>We introduce StoMPP, which uses layerwise masking to progressively replace differentiable clipped weights/activations with hard binary step functions.
- Score: 1.1516147824168732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate progressive freezing as an alternative to straight-through estimators (STE) for training binary networks from scratch. Under controlled training conditions, we find that while global progressive freezing works for binary-weight networks, it fails for full binary neural networks due to activation-induced gradient blockades. We introduce StoMPP (Stochastic Masked Partial Progressive Binarization), which uses layerwise stochastic masking to progressively replace differentiable clipped weights/activations with hard binary step functions, while only backpropagating through the unfrozen (clipped) subset (i.e., no straight-through estimator). Under a matched minimal training recipe, StoMPP improves accuracy over a BinaryConnect-style STE baseline, with gains that increase with depth (e.g., for ResNet-50 BNN: +18.0 on CIFAR-10, +13.5 on CIFAR-100, and +3.8 on ImageNet; for ResNet-18: +3.1, +4.7, and +1.3). For binary-weight networks, StoMPP achieves 91.2\% accuracy on CIFAR-10 and 69.5\% on CIFAR-100 with ResNet-50. We analyze training dynamics under progressive freezing, revealing non-monotonic convergence and improved depth scaling under binarization constraints.
Related papers
- Layer-wise QUBO-Based Training of CNN Classifiers for Quantum Annealing [0.0]
We propose an iterative framework based on Quadratic Un Binary Optimization (QUBO) for training the head of convolutional neural networks (CNNs)<n>A per-output decomposition splits the $C$-class problem into $C$ independent QUBOs, each with $(d+1)K$ binary variables, where $d$ is the feature dimension and $K$ is the bit precision.<n>We evaluate the method on six image-classification benchmarks (sklearn digits, MNIST, Fashion-MNIST, CIFAR-10, EMNIST, KMNIST)
arXiv Detail & Related papers (2026-03-03T13:10:36Z) - Progressive Supernet Training for Efficient Visual Autoregressive Modeling [56.15415456746672]
We propose a training strategy that breaks through the frontier of generation quality for both paradigms and the full network.<n>Experiments on ImageNet demonstrate that, compared to the pretrained VAR-d30, VARiant-d16 and VARiant-d8 achieve nearly equivalent quality.<n> VARiant-d2 achieves 3.5 times speedup and 80% memory reduction at moderate quality cost.
arXiv Detail & Related papers (2025-11-20T16:59:24Z) - NM-Hebb: Coupling Local Hebbian Plasticity with Metric Learning for More Accurate and Interpretable CNNs [0.0]
NM-Hebb integrates neuro-inspired local plasticity with distance-aware supervision.<n>Phase 1 extends standard supervised training by jointly optimising a cross-entropy objective.<n>Phase 2 fine-tunes the backbone with a pairwise metric-learning loss.
arXiv Detail & Related papers (2025-08-27T13:53:04Z) - Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning [32.918269107547616]
Pruning of deep neural networks has been an effective technique for reducing model size while preserving most of the performance of dense networks.<n>Recent sparse learning methods have shown promising performance up to moderate sparsity levels such as 95% and 98%.<n>We propose a collection of techniques that enable the continuous learning of networks without accuracy collapse even at extreme sparsities.
arXiv Detail & Related papers (2024-11-20T18:54:53Z) - Improved techniques for deterministic l2 robustness [63.34032156196848]
Training convolutional neural networks (CNNs) with a strict 1-Lipschitz constraint under the $l_2$ norm is useful for adversarial robustness, interpretable gradients and stable training.
We introduce a procedure to certify robustness of 1-Lipschitz CNNs by replacing the last linear layer with a 1-hidden layer.
We significantly advance the state-of-the-art for standard and provable robust accuracies on CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2022-11-15T19:10:12Z) - Differentially private training of residual networks with scale
normalisation [64.60453677988517]
We investigate the optimal choice of replacement layer for Batch Normalisation (BN) in residual networks (ResNets)
We study the phenomenon of scale mixing in residual blocks, whereby the activations on the two branches are scaled differently.
arXiv Detail & Related papers (2022-03-01T09:56:55Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training [62.932299614630985]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.<n>FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Towards Lossless Binary Convolutional Neural Networks Using Piecewise
Approximation [4.023728681102073]
CNNs can significantly reduce the number of arithmetic operations and the size of memory storage.
However, the accuracy degradation of single and multiple binary CNNs is unacceptable for modern architectures.
We propose a Piecewise Approximation scheme for multiple binary CNNs which lessens accuracy loss by approximating full precision weights and activations.
arXiv Detail & Related papers (2020-08-08T13:32:33Z) - Distillation Guided Residual Learning for Binary Convolutional Neural
Networks [83.6169936912264]
It is challenging to bridge the performance gap between Binary CNN (BCNN) and Floating point CNN (FCNN)
We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN.
To minimize the performance gap, we enforce BCNN to produce similar intermediate feature maps with the ones of FCNN.
This training strategy, i.e., optimizing each binary convolutional block with block-wise distillation loss derived from FCNN, leads to a more effective optimization to BCNN.
arXiv Detail & Related papers (2020-07-10T07:55:39Z) - Convolutional Neural Network Training with Distributed K-FAC [14.2773046188145]
Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix.
We investigate here a scalable K-FAC design and its applicability in convolutional neural network (CNN) training at scale.
arXiv Detail & Related papers (2020-07-01T22:00:53Z) - Training Binary Neural Networks with Real-to-Binary Convolutions [52.91164959767517]
We show how to train binary networks to within a few percent points of the full precision counterpart.
We show how to build a strong baseline, which already achieves state-of-the-art accuracy.
We show that, when putting all of our improvements together, the proposed model beats the current state of the art by more than 5% top-1 accuracy on ImageNet.
arXiv Detail & Related papers (2020-03-25T17:54:38Z) - RPR: Random Partition Relaxation for Training; Binary and Ternary Weight
Neural Networks [23.45606380793965]
We present Random Partition Relaxation (RPR), a method for strong quantization of neural networks weight to binary (+1/-1) and ternary (+1/0/-1) values.
We demonstrate binary and ternary-weight networks with accuracies beyond the state-of-the-art for GoogLeNet and competitive performance for ResNet-18 and ResNet-50.
arXiv Detail & Related papers (2020-01-04T15:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.