Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and
Optimality
- URL: http://arxiv.org/abs/2307.06915v2
- Date: Tue, 18 Jul 2023 17:28:02 GMT
- Title: Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and
Optimality
- Authors: Ziyang Wei, Wanrong Zhu, Wei Biao Wu
- Abstract summary: Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning.
Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings.
- Score: 5.817158625734484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic Gradient Descent (SGD) is one of the simplest and most popular
algorithms in modern statistical and machine learning due to its computational
and memory efficiency. Various averaging schemes have been proposed to
accelerate the convergence of SGD in different settings. In this paper, we
explore a general averaging scheme for SGD. Specifically, we establish the
asymptotic normality of a broad range of weighted averaged SGD solutions and
provide asymptotically valid online inference approaches. Furthermore, we
propose an adaptive averaging scheme that exhibits both optimal statistical
rate and favorable non-asymptotic convergence, drawing insights from the
optimal weight for the linear model in terms of non-asymptotic mean squared
error (MSE).
Related papers
- Asymptotically Optimal Change Detection for Unnormalized Pre- and Post-Change Distributions [65.38208224389027]
This paper addresses the problem of detecting changes when only unnormalized pre- and post-change distributions are accessible.
Our approach is based on the estimation of the Cumulative Sum statistics, which is known to produce optimal performance.
arXiv Detail & Related papers (2024-10-18T17:13:29Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Adaptive Linear Estimating Equations [5.985204759362746]
In this paper, we propose a general method for constructing debiased estimator.
It makes use of the idea of adaptive linear estimating equations, and we establish theoretical guarantees of normality.
A salient feature of our estimator is that in the context of multi-armed bandits, our estimator retains the non-asymptotic performance.
arXiv Detail & Related papers (2023-07-14T12:55:47Z) - Accelerated stochastic approximation with state-dependent noise [7.4648480208501455]
We consider a class of smooth convex optimization problems under general assumptions on the quadratic noise in the gradient observation.
Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics.
We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate.
arXiv Detail & Related papers (2023-07-04T06:06:10Z) - Online Statistical Inference for Contextual Bandits via Stochastic
Gradient Descent [10.108468796986074]
We study the online statistical inference of model parameters in a contextual bandit framework of decision-making.
We propose a general framework for online and adaptive data collection environment that can update decision rules via weighted gradient descent.
arXiv Detail & Related papers (2022-12-30T18:57:08Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Tuning Stochastic Gradient Algorithms for Statistical Inference via
Large-Sample Asymptotics [18.93569692490218]
tuning of gradient algorithms often based on trial-and-error rather than generalizable theory.
We show that averaging with a large fixed step size is robust to the choice of tuning parameters.
We lay the foundation for a systematic analysis of other gradient Monte Carlo algorithms.
arXiv Detail & Related papers (2022-07-25T17:58:09Z) - Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality [8.771678221101368]
We analyze an approximation for decision-dependent problems, wherein the data distribution used by the algorithm evolves along the iterate sequence.
We show that under mild assumptions, the deviation between the iterate of the algorithm and its solution isally normal.
We also show that the performance of the algorithm with averaging is locally minimax optimal.
arXiv Detail & Related papers (2022-07-09T01:44:17Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.