The Implicit Regularization of Stochastic Gradient Flow for Least
Squares
- URL: http://arxiv.org/abs/2003.07802v2
- Date: Fri, 19 Jun 2020 21:55:36 GMT
- Title: The Implicit Regularization of Stochastic Gradient Flow for Least
Squares
- Authors: Alnur Ali, Edgar Dobriban, and Ryan J. Tibshirani
- Abstract summary: We study the implicit regularization of mini-batch gradient descent, when applied to the fundamental problem of least squares regression.
We leverage a continuous-time differential equation having the same moments as gradient descent, which we call gradient flow.
We give a bound on the excess risk of gradient flow at time $t$, over ridge regression with tuning parameter $lambda = 1/t$.
- Score: 24.976079444818552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the implicit regularization of mini-batch stochastic gradient
descent, when applied to the fundamental problem of least squares regression.
We leverage a continuous-time stochastic differential equation having the same
moments as stochastic gradient descent, which we call stochastic gradient flow.
We give a bound on the excess risk of stochastic gradient flow at time $t$,
over ridge regression with tuning parameter $\lambda = 1/t$. The bound may be
computed from explicit constants (e.g., the mini-batch size, step size, number
of iterations), revealing precisely how these quantities drive the excess risk.
Numerical examples show the bound can be small, indicating a tight relationship
between the two estimators. We give a similar result relating the coefficients
of stochastic gradient flow and ridge. These results hold under no conditions
on the data matrix $X$, and across the entire optimization path (not just at
convergence).
Related papers
- Limit Theorems for Stochastic Gradient Descent with Infinite Variance [47.87144151929621]
We show that the gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-rnstein process driven by an appropriate L'evy process.
We also explore the applications of these results in linear regression and logistic regression models.
arXiv Detail & Related papers (2024-10-21T09:39:10Z) - Discrete error dynamics of mini-batch gradient descent for least squares regression [4.159762735751163]
We study the dynamics of mini-batch gradient descent for at least squares when sampling without replacement.
We also study discretization effects that a continuous-time gradient flow analysis cannot detect, and show that minibatch gradient descent converges to a step-size dependent solution.
arXiv Detail & Related papers (2024-06-06T02:26:14Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z) - Non-asymptotic bounds for stochastic optimization with biased noisy
gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error.
Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z) - Stochastic gradient-free descents [8.663453034925363]
We propose gradient-free methods and accelerated gradients with momentum for solving optimization problems.
We analyze the convergence behavior of these methods under the mean-variance framework.
arXiv Detail & Related papers (2019-12-31T13:56:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.