A Contour Stochastic Gradient Langevin Dynamics Algorithm for
Simulations of Multi-modal Distributions
- URL: http://arxiv.org/abs/2010.09800v2
- Date: Mon, 23 May 2022 13:27:58 GMT
- Title: A Contour Stochastic Gradient Langevin Dynamics Algorithm for
Simulations of Multi-modal Distributions
- Authors: Wei Deng and Guang Lin and Faming Liang
- Abstract summary: We propose an adaptively weighted gradient Langevin dynamics (SGLD) for learning in big data statistics.
The proposed algorithm is tested on benchmark datasets including CIFAR100.
- Score: 17.14287157979558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an adaptively weighted stochastic gradient Langevin dynamics
algorithm (SGLD), so-called contour stochastic gradient Langevin dynamics
(CSGLD), for Bayesian learning in big data statistics. The proposed algorithm
is essentially a \emph{scalable dynamic importance sampler}, which
automatically \emph{flattens} the target distribution such that the simulation
for a multi-modal distribution can be greatly facilitated. Theoretically, we
prove a stability condition and establish the asymptotic convergence of the
self-adapting parameter to a {\it unique fixed-point}, regardless of the
non-convexity of the original energy function; we also present an error
analysis for the weighted averaging estimators. Empirically, the CSGLD
algorithm is tested on multiple benchmark datasets including CIFAR10 and
CIFAR100. The numerical results indicate its superiority to avoid the local
trap problem in training deep neural networks.
Related papers
- Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient [6.563379950720334]
We introduce a new Langevin dynamics based algorithm, called e-TH$varepsilon$O POULA, to solve optimization problems with discontinuous gradients.
Three key applications in finance and insurance are provided, namely, multi-period portfolio optimization, transfer learning in multi-period portfolio optimization, and insurance claim prediction.
arXiv Detail & Related papers (2022-10-24T13:10:06Z) - Rigorous dynamical mean field theory for stochastic gradient descent
methods [17.90683687731009]
We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods.
This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
arXiv Detail & Related papers (2022-10-12T21:10:55Z) - Interacting Contour Stochastic Gradient Langevin Dynamics [22.131194626068027]
We propose an interacting contour gradient Langevin dynamics (ICSGLD) sampler with efficient interactions.
We show that ICSGLD can be theoretically more efficient than a single-chain CSGLD with an equivalent computational budget.
We also present a novel random-field function, which facilitates the estimation of self-adapting parameters in big data.
arXiv Detail & Related papers (2022-02-20T17:23:09Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Stochastic Gradient Langevin Dynamics Algorithms with Adaptive Drifts [8.36840154574354]
We propose a class of adaptive gradient Markov chain Monte Carlo (SGMCMC) algorithms, where the drift function is biased to enhance escape from saddle points and the bias is adaptively adjusted according to the gradient of past samples.
We demonstrate via numerical examples that the proposed algorithms can significantly outperform the existing SGMCMC algorithms.
arXiv Detail & Related papers (2020-09-20T22:03:39Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Dynamical mean-field theory for stochastic gradient descent in Gaussian
mixture classification [25.898873960635534]
We analyze in a closed learning dynamics of gradient descent (SGD) for a single-layer neural network classifying a high-dimensional landscape.
We define a prototype process for which can be extended to a continuous-dimensional gradient flow.
In the full-batch limit, we recover the standard gradient flow.
arXiv Detail & Related papers (2020-06-10T22:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.