Time-independent Generalization Bounds for SGLD in Non-convex Settings
- URL: http://arxiv.org/abs/2111.12876v1
- Date: Thu, 25 Nov 2021 02:31:52 GMT
- Title: Time-independent Generalization Bounds for SGLD in Non-convex Settings
- Authors: Tyler Farghly, Patrick Rebeschini
- Abstract summary: We establish generalization error bounds for Langevin dynamics (SGLD) with constant learning rate under the assumptions of dissipativity and Euclidean gradient projections.
Our analysis also supports variants that use different discretization methods, or use non-is-is-noise projections.
- Score: 23.833787505938858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We establish generalization error bounds for stochastic gradient Langevin
dynamics (SGLD) with constant learning rate under the assumptions of
dissipativity and smoothness, a setting that has received increased attention
in the sampling/optimization literature. Unlike existing bounds for SGLD in
non-convex settings, ours are time-independent and decay to zero as the sample
size increases. Using the framework of uniform stability, we establish
time-independent bounds by exploiting the Wasserstein contraction property of
the Langevin diffusion, which also allows us to circumvent the need to bound
gradients using Lipschitz-like assumptions. Our analysis also supports variants
of SGLD that use different discretization methods, incorporate Euclidean
projections, or use non-isotropic noise.
Related papers
- Time-Independent Information-Theoretic Generalization Bounds for SGLD [4.73194777046253]
We provide novel information-theoretic generalization bounds for Langevin dynamics datasets.
Our bounds are based on the assumptions of smoothness and dissipation, and are non-exponential.
arXiv Detail & Related papers (2023-11-02T07:42:23Z) - Generalization Bounds for Label Noise Stochastic Gradient Descent [0.0]
We generalization error bounds for gradient descent (SGD) with label noise in non-metric conditions.
Our analysis offers insights into the effect of label noise.
arXiv Detail & Related papers (2023-11-01T03:51:46Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z) - On Learning Rates and Schr\"odinger Operators [105.32118775014015]
We present a general theoretical analysis of the effect of the learning rate.
We find that the learning rate tends to zero for a broad non- neural class functions.
arXiv Detail & Related papers (2020-04-15T09:52:37Z) - Non-Convex Optimization via Non-Reversible Stochastic Gradient Langevin
Dynamics [27.097121544378528]
Gradient Langevin Dynamics (SGLD) is a powerful algorithm for optimizing a non- objective gradient.
NSGLD is based on discretization of the non-reversible diffusion.
arXiv Detail & Related papers (2020-04-06T17:11:03Z) - Dimension-free convergence rates for gradient Langevin dynamics in RKHS [47.198067414691174]
Gradient Langevin dynamics (GLD) and SGLD have attracted considerable attention lately.
We provide a convergence analysis GLD and SGLD when the space is an infinite dimensional space.
arXiv Detail & Related papers (2020-02-29T17:14:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.