Related papers: Unbiased least squares regression via averaged stochastic gradient descent

Unbiased least squares regression via averaged stochastic gradient descent

URL: http://arxiv.org/abs/2406.18623v1
Date: Wed, 26 Jun 2024 11:39:22 GMT
Title: Unbiased least squares regression via averaged stochastic gradient descent
Authors: Nabil Kahalé,
Abstract summary: We consider an on-line least squares regression problem with optimal solution $theta*$ and Hessian matrix H. For $kge2$, we provide an unbiased estimator of $theta*$ that is a modification of the time-average estimator.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider an on-line least squares regression problem with optimal solution $\theta^*$ and Hessian matrix H, and study a time-average stochastic gradient descent estimator of $\theta^*$. For $k\ge2$, we provide an unbiased estimator of $\theta^*$ that is a modification of the time-average estimator, runs with an expected number of time-steps of order k, with O(1/k) expected excess risk. The constant behind the O notation depends on parameters of the regression and is a poly-logarithmic function of the smallest eigenvalue of H. We provide both a biased and unbiased estimator of the expected excess risk of the time-average estimator and of its unbiased counterpart, without requiring knowledge of either H or $\theta^*$. We describe an "average-start" version of our estimators with similar properties. Our approach is based on randomized multilevel Monte Carlo. Our numerical experiments confirm our theoretical findings.

Related papers

Adaptive stable distribution and Hurst exponent by method of moments moving estimator for nonstationary time series [0.49728186750345144]
We will focus on novel more agnostic approach: moving estimator, which estimates parameters separately for every time.<n>We will show its applications for alpha-Stable distribution, which also influences Hurst exponent, hence can be used for its adaptive estimation.
arXiv Detail & Related papers (2025-05-20T08:36:49Z)
Local Polynomial Lp-norm Regression [0.0]
Local least squares estimation cannot provide optimal results when non-Gaussian noise is present. It is suggested that $L_p$-norm estimators be used to minimize the residuals when these exhibit non-normal kurtosis. We show our method's superiority over local least squares in one-dimensional data and show promising outcomes for higher dimensions, specifically in 2D.
arXiv Detail & Related papers (2025-04-25T21:04:19Z)
Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate. We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z)
Unbiased Kinetic Langevin Monte Carlo with Inexact Gradients [0.8749675983608172]
We present an unbiased method for posterior means based on kinetic Langevin dynamics. Our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem. Our results demonstrate that in large-scale applications, the unbiased algorithm we present can be 2-3 orders of magnitude more efficient than the gold-standard" randomized Hamiltonian Monte Carlo.
arXiv Detail & Related papers (2023-11-08T21:19:52Z)
Estimating the minimizer and the minimum value of a regression function under passive design [72.85024381807466]
We propose a new method for estimating the minimizer $boldsymbolx*$ and the minimum value $f*$ of a smooth and strongly convex regression function $f$. We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $boldsymbolz_n$, and for the risk of estimating $f*$.
arXiv Detail & Related papers (2022-11-29T18:38:40Z)
Efficient Truncated Linear Regression with Unknown Noise Variance [26.870279729431328]
We provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown. Our estimator is based on an efficient implementation of Projected Gradient Descent on the negative-likelihood of the truncated sample.
arXiv Detail & Related papers (2022-08-25T12:17:37Z)
Constructing unbiased gradient estimators with finite variance for conditional stochastic optimization [0.0]
We study gradient descent for solving conditional optimization problems. The gradient of such a parametric nested expectation is again expressed as a nested expectation. We show under some conditions that a multilevel Monte Carlo gradient estimator is unbiased.
arXiv Detail & Related papers (2022-06-04T13:28:44Z)
Polyak-Ruppert Averaged Q-Leaning is Statistically Efficient [90.14768299744792]
We study synchronous Q-learning with Polyak-Ruppert averaging (a.k.a., averaged Q-leaning) in a $gamma$-discounted MDP. We establish normality for the iteration averaged $barboldsymbolQ_T$. In short, our theoretical analysis shows averaged Q-Leaning is statistically efficient.
arXiv Detail & Related papers (2021-12-29T14:47:56Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
Stochastic Bias-Reduced Gradient Methods [44.35885731095432]
We develop a new primitive for optimization: a low-bias, low-cost smoothing of ther $x_star$ of any bound of the Moreau-Yoshida function.
arXiv Detail & Related papers (2021-06-17T13:33:05Z)
SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets. Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization. This provably reduces the mean squared error. We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z)
Conditional Uncorrelation and Efficient Non-approximate Subset Selection in Sparse Regression [72.84177488527398]
We consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation. By the proposed method, the computational complexity is reduced from $O(frac16k3+mk2+mkd)$ to $O(frac16k3+frac12mk2)$ for each candidate subset in sparse regression.
arXiv Detail & Related papers (2020-09-08T20:32:26Z)
Nearest Neighbour Based Estimates of Gradients: Sharp Nonasymptotic Bounds and Applications [0.6445605125467573]
gradient estimation is of crucial importance in statistics and learning theory. We consider here the classic regression setup, where a real valued square integrable r.v. $Y$ is to be predicted. We prove nonasymptotic bounds improving upon those obtained for alternative estimation methods.
arXiv Detail & Related papers (2020-06-26T15:19:43Z)
SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.