Langevin algorithms for Markovian Neural Networks and Deep Stochastic
control
- URL: http://arxiv.org/abs/2212.12018v1
- Date: Thu, 22 Dec 2022 20:00:11 GMT
- Title: Langevin algorithms for Markovian Neural Networks and Deep Stochastic
control
- Authors: Pierre Bras
- Abstract summary: Gradient Descent Langevin Dynamics (SGLD) algorithms are known to improve the training of neural networks in some cases where the neural network is very deep.
We numerically show that Langevin algorithms improve the training on various control problems like hedging and resource management, and for different choices of gradient descent methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic Gradient Descent Langevin Dynamics (SGLD) algorithms, which add
noise to the classic gradient descent, are known to improve the training of
neural networks in some cases where the neural network is very deep. In this
paper we study the possibilities of training acceleration for the numerical
resolution of stochastic control problems through gradient descent, where the
control is parametrized by a neural network. If the control is applied at many
discretization times then solving the stochastic control problem reduces to
minimizing the loss of a very deep neural network. We numerically show that
Langevin algorithms improve the training on various stochastic control problems
like hedging and resource management, and for different choices of gradient
descent methods.
Related papers
- Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Multisymplectic Formulation of Deep Learning Using Mean--Field Type
Control and Nonlinear Stability of Training Algorithm [0.0]
We formulate training of deep neural networks as a hydrodynamics system with a multisymplectic structure.
For that, the deep neural network is modelled using a differential equation and, thereby, mean-field type control is used to train it.
The numerical scheme, yields an approximated solution which is also an exact solution of a hydrodynamics system with a multisymplectic structure.
arXiv Detail & Related papers (2022-07-07T23:14:12Z) - Differentially private training of neural networks with Langevin
dynamics forcalibrated predictive uncertainty [58.730520380312676]
We show that differentially private gradient descent (DP-SGD) can yield poorly calibrated, overconfident deep learning models.
This represents a serious issue for safety-critical applications, e.g. in medical diagnosis.
arXiv Detail & Related papers (2021-07-09T08:14:45Z) - Convergence rates for gradient descent in the training of
overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches.
It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z) - Stochastic Markov Gradient Descent and Training Low-Bit Neural Networks [77.34726150561087]
We introduce Gradient Markov Descent (SMGD), a discrete optimization method applicable to training quantized neural networks.
We provide theoretical guarantees of algorithm performance as well as encouraging numerical results.
arXiv Detail & Related papers (2020-08-25T15:48:15Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z) - Taming neural networks with TUSLA: Non-convex learning via adaptive
stochastic gradient Langevin algorithms [0.0]
We offer on an appropriately constructed gradient algorithm based on problematic Lange dynamics (SGLD)
We also provide nonasymptotic analysis of the use of new algorithm's convergence properties.
The roots of the TUSLA algorithm are based on the taming processes with developed coefficients citettamed-euler.
arXiv Detail & Related papers (2020-06-25T16:06:22Z) - Constraint-Based Regularization of Neural Networks [0.0]
We propose a method for efficiently incorporating constraints into a gradient Langevin framework for the training of deep neural networks.
Appropriately designed, they reduce the vanishing/exploding gradient problem, control weight magnitudes and stabilize deep neural networks.
arXiv Detail & Related papers (2020-06-17T19:28:41Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z) - Semi-Implicit Back Propagation [1.5533842336139065]
We propose a semi-implicit back propagation method for neural network training.
The difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping.
Experiments on both MNIST and CIFAR-10 demonstrate that the proposed algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy.
arXiv Detail & Related papers (2020-02-10T03:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.