Subsampling Error in Stochastic Gradient Langevin Diffusions
- URL: http://arxiv.org/abs/2305.13882v2
- Date: Fri, 26 Apr 2024 21:08:02 GMT
- Title: Subsampling Error in Stochastic Gradient Langevin Diffusions
- Authors: Kexin Jin, Chenguang Liu, Jonas Latz,
- Abstract summary: The Gradient Langevin Dynamics (SGLD) are popularly used to approximate Bayesian posterior distributions in statistical learning procedures with large-scale data.
The first error is introduced by an Euler--Maruyama discretisation of a Langevin diffusion process, the second error comes from the data subsampling that enables its use in large-scale data settings.
We show the exponential ergodicity of SLGDiff and that the Wasserstein distance between the posterior and the limiting distribution of SGLDiff is bounded above by a fractional power of the mean waiting time.
- Score: 0.7900303955225063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Stochastic Gradient Langevin Dynamics (SGLD) are popularly used to approximate Bayesian posterior distributions in statistical learning procedures with large-scale data. As opposed to many usual Markov chain Monte Carlo (MCMC) algorithms, SGLD is not stationary with respect to the posterior distribution; two sources of error appear: The first error is introduced by an Euler--Maruyama discretisation of a Langevin diffusion process, the second error comes from the data subsampling that enables its use in large-scale data settings. In this work, we consider an idealised version of SGLD to analyse the method's pure subsampling error that we then see as a best-case error for diffusion-based subsampling MCMC methods. Indeed, we introduce and study the Stochastic Gradient Langevin Diffusion (SGLDiff), a continuous-time Markov process that follows the Langevin diffusion corresponding to a data subset and switches this data subset after exponential waiting times. There, we show the exponential ergodicity of SLGDiff and that the Wasserstein distance between the posterior and the limiting distribution of SGLDiff is bounded above by a fractional power of the mean waiting time. We bring our results into context with other analyses of SGLD.
Related papers
- Non-asymptotic Analysis of Diffusion Annealed Langevin Monte Carlo for Generative Modelling [1.9526430269580959]
We provide non-asymptotic error bounds for the Langevin dynamics where the path of distributions is defined as Gaussian convolutions of the data distribution as in diffusion models.
We then extend our results to recently proposed heavy-tailed (Student's t) diffusion paths, demonstrating their theoretical properties for heavy-tailed data distributions for the first time.
arXiv Detail & Related papers (2025-02-13T13:18:30Z) - Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)
We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - An Improved Analysis of Langevin Algorithms with Prior Diffusion for
Non-Log-Concave Sampling [27.882407333690267]
We show that modified Langevin algorithm with prior diffusion can converge dimension independently for strongly log-concave target distributions.
We also prove that the modified Langevin algorithm can also obtain the dimension-independent convergence of KL divergence with different step size schedules.
arXiv Detail & Related papers (2024-03-10T11:50:34Z) - Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse
Problems [64.29491112653905]
We propose a novel and efficient diffusion sampling strategy that synergistically combines the diffusion sampling and Krylov subspace methods.
Specifically, we prove that if tangent space at a denoised sample by Tweedie's formula forms a Krylov subspace, then the CG with the denoised data ensures the data consistency update to remain in the tangent space.
Our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.
arXiv Detail & Related papers (2023-03-10T07:42:49Z) - Resolving the Mixing Time of the Langevin Algorithm to its Stationary
Distribution for Log-Concave Sampling [34.66940399825547]
This paper characterizes the mixing time of the Langevin Algorithm to its stationary distribution.
We introduce a technique from the differential privacy literature to the sampling literature.
arXiv Detail & Related papers (2022-10-16T05:11:16Z) - A sharp uniform-in-time error estimate for Stochastic Gradient Langevin
Dynamics [11.437892757715877]
We establish a sharp uniform-in-time error estimate for the Gradient Langevin Dynamics (SGLD)
We obtain a uniform-in-time $O(eta2)$ bound for the KL-divergence between the SGLD iteration and the Langevin diffusion.
arXiv Detail & Related papers (2022-07-19T14:38:52Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Federated Stochastic Gradient Langevin Dynamics [12.180900849847252]
gradient MCMC methods, such as gradient Langevin dynamics (SGLD), employ fast but noisy gradient estimates to enable large-scale posterior sampling.
We propose conducive gradients, a simple mechanism that combines local likelihood approximations to correct gradient updates.
We demonstrate that our approach can handle delayed communication rounds, converging to the target posterior in cases where DSGLD fails.
arXiv Detail & Related papers (2020-04-23T15:25:09Z) - Generative Modeling with Denoising Auto-Encoders and Langevin Sampling [88.83704353627554]
We show that both DAE and DSM provide estimates of the score of the smoothed population density.
We then apply our results to the homotopy method of arXiv:1907.05600 and provide theoretical justification for its empirical success.
arXiv Detail & Related papers (2020-01-31T23:50:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.