Removing the mini-batching error in Bayesian inference using Adaptive
Langevin dynamics
- URL: http://arxiv.org/abs/2105.10347v1
- Date: Fri, 21 May 2021 13:39:39 GMT
- Title: Removing the mini-batching error in Bayesian inference using Adaptive
Langevin dynamics
- Authors: Inass Sekkat, Gabriel Stoltz
- Abstract summary: We advocate using the so-called Adaptive Langevin dynamics, which is a modification of standard inertial Langevin dynamics with a dynamical friction.
We investigate the practical relevance of the assumptions underpinning Adaptive Langevin.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The computational cost of usual Monte Carlo methods for sampling a posteriori
laws in Bayesian inference scales linearly with the number of data points. One
option to reduce it to a fraction of this cost is to resort to mini-batching in
conjunction with unadjusted discretizations of Langevin dynamics, in which case
only a random fraction of the data is used to estimate the gradient. However,
this leads to an additional noise in the dynamics and hence a bias on the
invariant measure which is sampled by the Markov chain. We advocate using the
so-called Adaptive Langevin dynamics, which is a modification of standard
inertial Langevin dynamics with a dynamical friction which automatically
corrects for the increased noise arising from mini-batching. We investigate the
practical relevance of the assumptions underpinning Adaptive Langevin (constant
covariance for the estimation of the gradient), which are not satisfied in
typical models of Bayesian inference; and show how to extend the approach to
more general situations.
Related papers
- Max-affine regression via first-order methods [7.12511675782289]
The max-affine model ubiquitously arises in applications in signal processing and statistics.
We present a non-asymptotic convergence analysis of gradient descent (GD) and mini-batch gradient descent (SGD) for max-affine regression.
arXiv Detail & Related papers (2023-08-15T23:46:44Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Unbiased Estimation using Underdamped Langevin Dynamics [0.0]
We focus upon developing an unbiased method via the underdamped Langevin dynamics.
We prove, under standard assumptions, that our estimator is of finite variance and either has finite expected cost, or has finite cost with a high probability.
arXiv Detail & Related papers (2022-06-14T23:05:56Z) - Implicit Bias of MSE Gradient Optimization in Underparameterized Neural
Networks [0.0]
We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow.
We show that the network learns eigenfunctions of an integral operator $T_Kinfty$ determined by the Neural Tangent Kernel (NTK)
We conclude that damped deviations offers a simple and unifying perspective of the dynamics when optimizing the squared error.
arXiv Detail & Related papers (2022-01-12T23:28:41Z) - Heavy-tailed denoising score matching [5.371337604556311]
We develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in Langevin dynamics.
On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
arXiv Detail & Related papers (2021-12-17T22:04:55Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections [73.95786440318369]
We focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of gradient descent (SGD)
We show that this effect induces an asymmetric heavy-tailed noise on gradient updates.
We then formally prove that GNIs induce an implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry.
arXiv Detail & Related papers (2021-02-13T21:28:09Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z) - Amortized variance reduction for doubly stochastic objectives [17.064916635597417]
Approximate inference in complex probabilistic models requires optimisation of doubly objective functions.
Current approaches do not take into account how mini-batchity affects samplingity, resulting in sub-optimal variance reduction.
We propose a new approach in which we use a recognition network to cheaply approximate the optimal control variate for each mini-batch, with no additional gradient computations.
arXiv Detail & Related papers (2020-03-09T13:23:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.