Related papers: A Simple and Efficient Stochastic Rounding Method for Training Neural Networks in Low Precision

Related papers

Training of Spiking Neural Networks with Expectation-Propagation [9.24888258922809]
We propose a unifying message-passing framework for training spiking neural networks (SNNs)<n>Our gradient-free method is capable of learning the marginal distributions of network parameters and simultaneously marginalizes parameters, such as the outputs of hidden layers.
arXiv Detail & Related papers (2025-06-30T11:59:56Z)
Utilising Gradient-Based Proposals Within Sequential Monte Carlo Samplers for Training of Partial Bayesian Neural Networks [3.2254941904559917]
Partial Bayesian neural networks (pBNNs) have been shown to perform competitively with fully Bayesian neural networks.<n>We introduce a new SMC-based training method for pBNNs by utilising a guided proposal and incorporating gradient-based Markov kernels.<n>We show that our new method outperforms the state-of-the-art in terms of predictive performance and optimal loss.
arXiv Detail & Related papers (2025-05-01T20:05:38Z)
A lifted Bregman strategy for training unfolded proximal neural network Gaussian denoisers [8.343594411714934]
Unfolded proximal neural networks (PNNs) form a family of methods that combines deep learning and proximal optimization approaches. We propose a lifted training formulation based on Bregman distances for unfolded PNNs. We assess the behaviour of the proposed training approach for PNNs through numerical simulations on image denoising.
arXiv Detail & Related papers (2024-08-16T13:41:34Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
Practical Convex Formulation of Robust One-hidden-layer Neural Network Training [12.71266194474117]
We show that the training of a one-hidden-layer, scalar-output fully-connected ReLULU neural network can be reformulated as a finite-dimensional convex program. We derive a convex optimization approach to efficiently solve the "adversarial training" problem. Our method can be applied to binary classification and regression, and provides an alternative to the current adversarial training methods.
arXiv Detail & Related papers (2021-05-25T22:06:27Z)
Sampling-free Variational Inference for Neural Networks with Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference. Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network [75.44925576268052]
The linear-chain Conditional Random Field (CRF) model is one of the most widely-used neural sequence labeling approaches. Exact probabilistic inference algorithms are typically applied in training and prediction stages of the CRF model. We propose to employ a parallelizable approximate variational inference algorithm for the CRF model.
arXiv Detail & Related papers (2020-09-17T12:18:43Z)
Training Sparse Neural Networks using Compressed Sensing [13.84396596420605]
We develop and test a novel method based on compressed sensing which combines the pruning and training into a single step. Specifically, we utilize an adaptively weighted $ell1$ penalty on the weights during training, which we combine with a generalization of the regularized dual averaging (RDA) algorithm in order to train sparse neural networks.
arXiv Detail & Related papers (2020-08-21T19:35:54Z)
Tune smarter not harder: A principled approach to tuning learning rates for shallow nets [13.203765985718201]
principled approach to choosing the learning rate is proposed for shallow feedforward neural networks. It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods.
arXiv Detail & Related papers (2020-03-22T09:38:35Z)
Stochastic gradient descent with random learning rate [0.0]
We propose to optimize neural networks with a uniformly-distributed random learning rate. By comparing the random learning rate protocol with cyclic and constant protocols, we suggest that the random choice is generically the best strategy in the small learning rate regime. We provide supporting evidence through experiments on both shallow, fully-connected and deep, convolutional neural networks for image classification on the MNIST and CIFAR10 datasets.
arXiv Detail & Related papers (2020-03-15T21:36:46Z)
Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.