Tight Risk Bounds for Gradient Descent on Separable Data
- URL: http://arxiv.org/abs/2303.01135v1
- Date: Thu, 2 Mar 2023 10:31:58 GMT
- Title: Tight Risk Bounds for Gradient Descent on Separable Data
- Authors: Matan Schliserman and Tomer Koren
- Abstract summary: We study the generalization properties of unregularized gradient methods applied to separable linear classification.
Risk lower bounds are the first in this context and establish the tightness of our upper bounds for any given tail decay rate.
- Score: 33.593203156666746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the generalization properties of unregularized gradient methods
applied to separable linear classification -- a setting that has received
considerable attention since the pioneering work of Soudry et al. (2018). We
establish tight upper and lower (population) risk bounds for gradient descent
in this setting, for any smooth loss function, expressed in terms of its tail
decay rate. Our bounds take the form $\Theta(r_{\ell,T}^2 / \gamma^2 T +
r_{\ell,T}^2 / \gamma^2 n)$, where $T$ is the number of gradient steps, $n$ is
size of the training set, $\gamma$ is the data margin, and $r_{\ell,T}$ is a
complexity term that depends on the (tail decay rate) of the loss function (and
on $T$). Our upper bound matches the best known upper bounds due to Shamir
(2021); Schliserman and Koren (2022), while extending their applicability to
virtually any smooth loss function and relaxing technical assumptions they
impose. Our risk lower bounds are the first in this context and establish the
tightness of our upper bounds for any given tail decay rate and in all
parameter regimes. The proof technique used to show these results is also
markedly simpler compared to previous work, and is straightforward to extend to
other gradient methods; we illustrate this by providing analogous results for
Stochastic Gradient Descent.
Related papers
- Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates [15.27596975662702]
We explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients.
Our results match the minimax lower bound in citekamath2022, indicating that the theoretical limit of convex optimization under DP is achievable.
arXiv Detail & Related papers (2024-08-19T11:07:05Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Two Sides of One Coin: the Limits of Untuned SGD and the Power of
Adaptive Methods [22.052459124774504]
We show that adaptive methods over untuned SGD alleviating the issue with smoothness and information advantage.
Our results provide theoretical justification for adaptive methods over untuned SGD in the absence of such exponential dependency.
arXiv Detail & Related papers (2023-05-21T14:40:43Z) - Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex
Optimization [9.019243171993553]
Training steps $T$ and step-size $eta$ might affect certify in smooth convex optimization (SCO) problems.
We first provide tight excess risk lower bounds for Gradient Descent (GD) and Gradient Descent (SGD)
Recent works show better rates can be attained but the improvement is reduced when training time is long.
arXiv Detail & Related papers (2023-03-19T20:24:33Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Tight Second-Order Certificates for Randomized Smoothing [106.06908242424481]
We show that there also exists a universal curvature-like bound for Gaussian random smoothing.
In addition to proving the correctness of this novel certificate, we show that SoS certificates are realizable and therefore tight.
arXiv Detail & Related papers (2020-10-20T18:03:45Z) - Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins [92.7662890047311]
We show that gradient descent finds halfspaces with classification error $tilde O(mathsfOPT1/2) + varepsilon$ in $mathrmpoly(d,1/varepsilon)$ time and sample complexity.
arXiv Detail & Related papers (2020-10-01T16:48:33Z) - Gradient Methods Never Overfit On Separable Data [31.714928102950584]
We show that standard gradient methods never overfit on separable data.
We present non-asymptotic bounds on the number of margin violations over the dataset.
arXiv Detail & Related papers (2020-06-30T18:01:46Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.