Related papers: Training neural networks using Metropolis Monte Carlo and an adaptive variant

Training neural networks using Metropolis Monte Carlo and an adaptive variant

URL: http://arxiv.org/abs/2205.07408v1
Date: Mon, 16 May 2022 01:01:55 GMT
Title: Training neural networks using Metropolis Monte Carlo and an adaptive variant
Authors: Stephen Whitelam, Viktor Selin, Ian Benlolo, Isaac Tamblyn
Abstract summary: We study the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network's structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm, aMC, to overcome these limitations. The intrinsic stochasticity of the Monte Carlo method allows aMC to train neural networks in which the gradient is too small to allow training by gradient descent. We suggest that, as for molecular simulation, Monte Carlo methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.

Related papers

A foundation for exact binarized morphological neural networks [2.8925699537310137]
Training and running deep neural networks (NNs) often demands a lot of computation and energy-intensive specialized hardware. One way to reduce the computation and power cost is to use binary weight NNs, but these are hard to train because the sign function has a non-smooth gradient. We present a model based on Mathematical Morphology (MM), which can binarize ConvNets without losing performance under certain conditions.
arXiv Detail & Related papers (2024-01-08T11:37:44Z)
NeuralFastLAS: Fast Logic-Based Learning from Raw Data [54.938128496934695]
Symbolic rule learners generate interpretable solutions, however they require the input to be encoded symbolically. Neuro-symbolic approaches overcome this issue by mapping raw data to latent symbolic concepts using a neural network. We introduce NeuralFastLAS, a scalable and fast end-to-end approach that trains a neural network jointly with a symbolic learner.
arXiv Detail & Related papers (2023-10-08T12:33:42Z)
Injecting Logical Constraints into Neural Networks via Straight-Through Estimators [5.6613898352023515]
Injecting discrete logical constraints into neural network learning is one of the main challenges in neuro-symbolic AI. We find that a straight-through-estimator, a method introduced to train binary neural networks, could effectively be applied to incorporate logical constraints into neural network learning.
arXiv Detail & Related papers (2023-07-10T05:12:05Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Dense Hebbian neural networks: a replica symmetric picture of supervised learning [4.133728123207142]
We consider dense, associative neural-networks trained by a teacher with supervision. We investigate their computational capabilities analytically, via statistical-mechanics of spin glasses, and numerically, via Monte Carlo simulations.
arXiv Detail & Related papers (2022-11-25T13:37:47Z)
Dense Hebbian neural networks: a replica symmetric picture of unsupervised learning [4.133728123207142]
We consider dense, associative neural-networks trained with no supervision. We investigate their computational capabilities analytically, via a statistical-mechanics approach, and numerically, via Monte Carlo simulations.
arXiv Detail & Related papers (2022-11-25T12:40:06Z)
A Kernel-Expanded Stochastic Neural Network [10.837308632004644]
Deep neural network often gets trapped into a local minimum in training. New kernel-expanded neural network (K-StoNet) model reformulates the network as a latent variable model. Model can be easily trained using the imputationregularized optimization (IRO) algorithm.
arXiv Detail & Related papers (2022-01-14T06:42:42Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Local Critic Training for Model-Parallel Learning of Deep Neural Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z)
Compressive sensing with un-trained neural networks: Gradient descent finds the smoothest approximation [60.80172153614544]
Un-trained convolutional neural networks have emerged as highly successful tools for image recovery and restoration. We show that an un-trained convolutional neural network can approximately reconstruct signals and images that are sufficiently structured, from a near minimal number of random measurements.
arXiv Detail & Related papers (2020-05-07T15:57:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.