Training Binary Neural Networks using the Bayesian Learning Rule
- URL: http://arxiv.org/abs/2002.10778v4
- Date: Tue, 18 Aug 2020 00:48:15 GMT
- Title: Training Binary Neural Networks using the Bayesian Learning Rule
- Authors: Xiangming Meng and Roman Bachmann and Mohammad Emtiyaz Khan
- Abstract summary: Neural networks with binary weights are computation-efficient and hardware-friendly, but their training is challenging because it involves a discrete optimization problem.
We propose a principled approach for training binary neural networks which justifies and extends existing approaches.
Our work provides a principled approach for training binary neural networks which justifies and extends existing approaches.
- Score: 19.01146578435531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks with binary weights are computation-efficient and
hardware-friendly, but their training is challenging because it involves a
discrete optimization problem. Surprisingly, ignoring the discrete nature of
the problem and using gradient-based methods, such as the Straight-Through
Estimator, still works well in practice. This raises the question: are there
principled approaches which justify such methods? In this paper, we propose
such an approach using the Bayesian learning rule. The rule, when applied to
estimate a Bernoulli distribution over the binary weights, results in an
algorithm which justifies some of the algorithmic choices made by the previous
approaches. The algorithm not only obtains state-of-the-art performance, but
also enables uncertainty estimation for continual learning to avoid
catastrophic forgetting. Our work provides a principled approach for training
binary neural networks which justifies and extends existing approaches.
Related papers
- Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - A lifted Bregman strategy for training unfolded proximal neural network Gaussian denoisers [8.343594411714934]
Unfolded proximal neural networks (PNNs) form a family of methods that combines deep learning and proximal optimization approaches.
We propose a lifted training formulation based on Bregman distances for unfolded PNNs.
We assess the behaviour of the proposed training approach for PNNs through numerical simulations on image denoising.
arXiv Detail & Related papers (2024-08-16T13:41:34Z) - Neural Active Learning Beyond Bandits [69.99592173038903]
We study both stream-based and pool-based active learning with neural network approximations.
We propose two algorithms based on the newly designed exploitation and exploration neural networks for stream-based and pool-based active learning.
arXiv Detail & Related papers (2024-04-18T21:52:14Z) - Discrete Neural Algorithmic Reasoning [18.497863598167257]
We propose to force neural reasoners to maintain the execution trajectory as a combination of finite predefined states.
trained with supervision on the algorithm's state transitions, such models are able to perfectly align with the original algorithm.
arXiv Detail & Related papers (2024-02-18T16:03:04Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - AdaSTE: An Adaptive Straight-Through Estimator to Train Binary Neural
Networks [34.263013539187355]
We propose a new algorithm for training deep neural networks (DNNs) with binary weights.
Experimental results demonstrate that our new algorithm offers favorable performance compared to existing approaches.
arXiv Detail & Related papers (2021-12-06T09:12:15Z) - Deep learning via message passing algorithms based on belief propagation [2.931240348160871]
We present a family of BP-based message-passing algorithms with a reinforcement field that biases towards locally entropic distributions.
These algorithms are capable of training multi-layer neural networks with discrete weights and activations with performance comparable to SGD-inspired solutions.
arXiv Detail & Related papers (2021-10-27T16:52:26Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.