Related papers: Training Binary Neural Networks using the Bayesian Learning Rule

Training Binary Neural Networks using the Bayesian Learning Rule

URL: http://arxiv.org/abs/2002.10778v4
Date: Tue, 18 Aug 2020 00:48:15 GMT
Title: Training Binary Neural Networks using the Bayesian Learning Rule
Authors: Xiangming Meng and Roman Bachmann and Mohammad Emtiyaz Khan
Abstract summary: Neural networks with binary weights are computation-efficient and hardware-friendly, but their training is challenging because it involves a discrete optimization problem. We propose a principled approach for training binary neural networks which justifies and extends existing approaches. Our work provides a principled approach for training binary neural networks which justifies and extends existing approaches.
Score: 19.01146578435531
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural networks with binary weights are computation-efficient and hardware-friendly, but their training is challenging because it involves a discrete optimization problem. Surprisingly, ignoring the discrete nature of the problem and using gradient-based methods, such as the Straight-Through Estimator, still works well in practice. This raises the question: are there principled approaches which justify such methods? In this paper, we propose such an approach using the Bayesian learning rule. The rule, when applied to estimate a Bernoulli distribution over the binary weights, results in an algorithm which justifies some of the algorithmic choices made by the previous approaches. The algorithm not only obtains state-of-the-art performance, but also enables uncertainty estimation for continual learning to avoid catastrophic forgetting. Our work provides a principled approach for training binary neural networks which justifies and extends existing approaches.

Related papers

Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP) For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z)
A lifted Bregman strategy for training unfolded proximal neural network Gaussian denoisers [8.343594411714934]
Unfolded proximal neural networks (PNNs) form a family of methods that combines deep learning and proximal optimization approaches. We propose a lifted training formulation based on Bregman distances for unfolded PNNs. We assess the behaviour of the proposed training approach for PNNs through numerical simulations on image denoising.
arXiv Detail & Related papers (2024-08-16T13:41:34Z)
Neural Active Learning Beyond Bandits [69.99592173038903]
We study both stream-based and pool-based active learning with neural network approximations. We propose two algorithms based on the newly designed exploitation and exploration neural networks for stream-based and pool-based active learning.
arXiv Detail & Related papers (2024-04-18T21:52:14Z)
Discrete Neural Algorithmic Reasoning [18.497863598167257]
We propose to force neural reasoners to maintain the execution trajectory as a combination of finite predefined states. trained with supervision on the algorithm's state transitions, such models are able to perfectly align with the original algorithm.
arXiv Detail & Related papers (2024-02-18T16:03:04Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
Scalable computation of prediction intervals for neural networks via matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure. This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z)
AdaSTE: An Adaptive Straight-Through Estimator to Train Binary Neural Networks [34.263013539187355]
We propose a new algorithm for training deep neural networks (DNNs) with binary weights. Experimental results demonstrate that our new algorithm offers favorable performance compared to existing approaches.
arXiv Detail & Related papers (2021-12-06T09:12:15Z)
Deep learning via message passing algorithms based on belief propagation [2.931240348160871]
We present a family of BP-based message-passing algorithms with a reinforcement field that biases towards locally entropic distributions. These algorithms are capable of training multi-layer neural networks with discrete weights and activations with performance comparable to SGD-inspired solutions.
arXiv Detail & Related papers (2021-10-27T16:52:26Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.