A Principled Bayesian Framework for Training Binary and Spiking Neural Networks
- URL: http://arxiv.org/abs/2505.17962v1
- Date: Fri, 23 May 2025 14:33:20 GMT
- Title: A Principled Bayesian Framework for Training Binary and Spiking Neural Networks
- Authors: James A. Walker, Moein Khajehnejad, Adeel Razi,
- Abstract summary: Spiking Bayesian Neural Networks (SBNNs) is a variational inference framework that uses posterior noise to train Binary and Spiking Neural Networks with IW-ST.<n>By linking low-bias conditions, vanishing gradients, and the KL term, we enable training of deep residual networks without normalisation.
- Score: 1.6658912537684454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a Bayesian framework for training binary and spiking neural networks that achieves state-of-the-art performance without normalisation layers. Unlike commonly used surrogate gradient methods -- often heuristic and sensitive to hyperparameter choices -- our approach is grounded in a probabilistic model of noisy binary networks, enabling fully end-to-end gradient-based optimisation. We introduce importance-weighted straight-through (IW-ST) estimators, a unified class generalising straight-through and relaxation-based estimators. We characterise the bias-variance trade-off in this family and derive a bias-minimising objective implemented via an auxiliary loss. Building on this, we introduce Spiking Bayesian Neural Networks (SBNNs), a variational inference framework that uses posterior noise to train Binary and Spiking Neural Networks with IW-ST. This Bayesian approach minimises gradient bias, regularises parameters, and introduces dropout-like noise. By linking low-bias conditions, vanishing gradients, and the KL term, we enable training of deep residual networks without normalisation. Experiments on CIFAR-10, DVS Gesture, and SHD show our method matches or exceeds existing approaches without normalisation or hand-tuned gradients.
Related papers
- Approximation and Gradient Descent Training with Neural Networks [0.0]
Recent work extends a neural tangent kernel (NTK) optimization argument to an under-parametrized regime.
This paper establishes analogous results for networks trained by gradient descent.
arXiv Detail & Related papers (2024-05-19T23:04:09Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - Domain Generalization Guided by Gradient Signal to Noise Ratio of
Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks.
We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Likelihood-Free Inference with Generative Neural Networks via Scoring
Rule Minimization [0.0]
Inference methods yield posterior approximations for simulator models with intractable likelihood.
Many works trained neural networks to approximate either the intractable likelihood or the posterior directly.
Here, we propose to approximate the posterior with generative networks trained by Scoring Rule minimization.
arXiv Detail & Related papers (2022-05-31T13:32:55Z) - Proxy Convexity: A Unified Framework for the Analysis of Neural Networks
Trained by Gradient Descent [95.94432031144716]
We propose a unified non- optimization framework for the analysis of a learning network.
We show that existing guarantees can be trained unified through gradient descent.
arXiv Detail & Related papers (2021-06-25T17:45:00Z) - A Distributed Optimisation Framework Combining Natural Gradient with
Hessian-Free for Discriminative Sequence Training [16.83036203524611]
This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training.
It relies on the linear conjugate gradient (CG) algorithm to combine the natural gradient (NG) method with local curvature information from Hessian-free (HF) or other second-order methods.
Experiments are reported on the multi-genre broadcast data set for a range of different acoustic model types.
arXiv Detail & Related papers (2021-03-12T22:18:34Z) - Bayesian Nested Neural Networks for Uncertainty Calibration and Adaptive
Compression [40.35734017517066]
Nested networks or slimmable networks are neural networks whose architectures can be adjusted instantly during testing time.
Recent studies have focused on a "nested dropout" layer, which is able to order the nodes of a layer by importance during training.
arXiv Detail & Related papers (2021-01-27T12:34:58Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.