Related papers: Smoothed SGD for quantiles: Bahadur representation and Gaussian approximation

Smoothed SGD for quantiles: Bahadur representation and Gaussian approximation

URL: http://arxiv.org/abs/2505.13299v1
Date: Mon, 19 May 2025 16:19:44 GMT
Title: Smoothed SGD for quantiles: Bahadur representation and Gaussian approximation
Authors: Likai Chen, Georg Keilbar, Wei Biao Wu,
Abstract summary: We consider the estimation of quantiles via a smoothed version of the gradient descent (SGD) algorithm.<n>We derive non-asymptotic tail probability bounds for the smoothed SGD quantile estimate.<n> Numerical studies show good finite sample behavior for our theoretical results.
Score: 4.2330023661329355
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper considers the estimation of quantiles via a smoothed version of the stochastic gradient descent (SGD) algorithm. By smoothing the score function in the conventional SGD quantile algorithm, we achieve monotonicity in the quantile level in that the estimated quantile curves do not cross. We derive non-asymptotic tail probability bounds for the smoothed SGD quantile estimate both for the case with and without Polyak-Ruppert averaging. For the latter, we also provide a uniform Bahadur representation and a resulting Gaussian approximation result. Numerical studies show good finite sample behavior for our theoretical results.

Related papers

Stochastic Quantum Sampling for Non-Logconcave Distributions and Estimating Partition Functions [13.16814860487575]
We present quantum algorithms for sampling from nonlogconcave probability distributions. $f$ can be written as a finite sum $f(x):= frac1Nsum_k=1N f_k(x)$.
arXiv Detail & Related papers (2023-10-17T17:55:32Z)
Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD) We use gradient new outliers as norm clipping chains. We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z)
Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems. We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum. Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z)
Computing the Variance of Shuffling Stochastic Gradient Algorithms via Power Spectral Density Analysis [6.497816402045099]
Two common alternatives to gradient descent (SGD) with theoretical benefits are random reshuffling (SGDRR) and shuffle-once (SGD-SO) We study the stationary variances of SGD, SGDRR and SGD-SO, whose leading terms decrease in this order, and obtain simple approximations.
arXiv Detail & Related papers (2022-06-01T17:08:04Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm. We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm. In this paper, we remove the necessity of a full gradient. The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z)
A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective. We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z)
Quantum Sub-Gaussian Mean Estimator [0.0]
We present a new quantum algorithm for estimating the mean of a real-valued random variable. Our estimator achieves a nearly-optimal quadratic speedup over the number of classical i.i.d. samples.
arXiv Detail & Related papers (2021-08-27T08:34:26Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Spectral clustering under degree heterogeneity: a case for the random walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree. In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z)
Oracle Lower Bounds for Stochastic Gradient Sampling Algorithms [39.746670539407084]
We consider the problem of sampling from a strongly log-concave density in $bbRd$. We prove an information theoretic lower bound on the number of gradient queries of the log density needed.
arXiv Detail & Related papers (2020-02-01T23:46:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.