Tuning Stochastic Gradient Algorithms for Statistical Inference via
Large-Sample Asymptotics
- URL: http://arxiv.org/abs/2207.12395v3
- Date: Thu, 20 Jul 2023 16:21:58 GMT
- Title: Tuning Stochastic Gradient Algorithms for Statistical Inference via
Large-Sample Asymptotics
- Authors: Jeffrey Negrea, Jun Yang, Haoyue Feng, Daniel M. Roy, Jonathan H.
Huggins
- Abstract summary: tuning of gradient algorithms often based on trial-and-error rather than generalizable theory.
We show that averaging with a large fixed step size is robust to the choice of tuning parameters.
We lay the foundation for a systematic analysis of other gradient Monte Carlo algorithms.
- Score: 18.93569692490218
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The tuning of stochastic gradient algorithms (SGAs) for optimization and
sampling is often based on heuristics and trial-and-error rather than
generalizable theory. We address this theory--practice gap by characterizing
the large-sample statistical asymptotics of SGAs via a joint
step-size--sample-size scaling limit. We show that iterate averaging with a
large fixed step size is robust to the choice of tuning parameters and
asymptotically has covariance proportional to that of the MLE sampling
distribution. We also prove a Bernstein--von Mises-like theorem to guide
tuning, including for generalized posteriors that are robust to model
misspecification. Numerical experiments validate our results and
recommendations in realistic finite-sample regimes. Our work lays the
foundation for a systematic analysis of other stochastic gradient Markov chain
Monte Carlo algorithms for a wide range of models.
Related papers
- Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers [49.1574468325115]
We introduce a unified convergence analysis framework for deterministic samplers.
Our framework achieves iteration complexity of $tilde O(d2/epsilon)$.
We also provide a detailed analysis of Denoising Implicit Diffusion Models (DDIM)-type samplers.
arXiv Detail & Related papers (2024-10-18T07:37:36Z) - Method-of-Moments Inference for GLMs and Doubly Robust Functionals under Proportional Asymptotics [30.324051162373973]
We consider the estimation of regression coefficients and signal-to-noise ratio in high-dimensional Generalized Linear Models (GLMs)
We derive Consistent and Asymptotically Normal (CAN) estimators of our targets of inference.
We complement our theoretical results with numerical experiments and comparisons with existing literature.
arXiv Detail & Related papers (2024-08-12T12:43:30Z) - Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and
Optimality [5.817158625734484]
Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning.
Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings.
arXiv Detail & Related papers (2023-07-13T17:29:01Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models [0.2741266294612776]
We analyze a class of gradient algorithms with momentum on a high-dimensional random least squares problem.
We show that (small-batch) momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly.
In the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum.
arXiv Detail & Related papers (2021-06-07T15:08:24Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - Multi-index Antithetic Stochastic Gradient Algorithm [0.0]
Gradient algorithms (SGAs) are ubiquitous in computational statistics, machine learning and optimisation.
Recent years have brought an influx of interest in SGAs, and the non-asymptotic analysis of their bias is now well-developed.
We show that MASGA is an optimal estimator from the mean square error-computational cost perspective within the class of Monte Carlo estimators.
arXiv Detail & Related papers (2020-06-10T22:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.