Uniform Convergence with Square-Root Lipschitz Loss
- URL: http://arxiv.org/abs/2306.13188v1
- Date: Thu, 22 Jun 2023 20:14:08 GMT
- Title: Uniform Convergence with Square-Root Lipschitz Loss
- Authors: Lijia Zhou, Zhen Dai, Frederic Koehler, Nathan Srebro
- Abstract summary: We show how these guarantees substantially generalize previous results based on smoothness (Lipschitz constant of the derivative)
We better understand "optimistic rate" and "learning guarantees"
- Score: 35.52581372892922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We establish generic uniform convergence guarantees for Gaussian data in
terms of the Rademacher complexity of the hypothesis class and the Lipschitz
constant of the square root of the scalar loss function. We show how these
guarantees substantially generalize previous results based on smoothness
(Lipschitz constant of the derivative), and allow us to handle the broader
class of square-root-Lipschitz losses, which includes also non-smooth loss
functions appropriate for studying phase retrieval and ReLU regression, as well
as rederive and better understand "optimistic rate" and interpolation learning
guarantees.
Related papers
- Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity [6.930866028329355]
We show that a wide class of continuity algorithms, including the subgradient method, preserve the $mathO convergence rate with constant failure rate.
Our analyses rest on rather weak assumptions: the Lipschitz parameter can be either bounded by a general growth function of $|x|$ or locally estimated through independent random samples.
arXiv Detail & Related papers (2024-01-25T06:06:31Z) - Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from
KKT Conditions for Margin Maximization [59.038366742773164]
Linears and leaky ReLU trained by gradient flow on logistic loss have an implicit bias towards satisfying the Karush-KuTucker (KKT) conditions.
In this work we establish a number of settings where the satisfaction of these conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks.
arXiv Detail & Related papers (2023-03-02T18:24:26Z) - Agnostic Learning of General ReLU Activation Using Gradient Descent [24.73731167081234]
We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions.
Our main result establishes that starting from random iterations, in a number of gradient descent outputs, with high probability, a ReLU function achieves a competitive error guarantee.
arXiv Detail & Related papers (2022-08-04T15:14:05Z) - Optimistic Rates: A Unifying Theory for Interpolation Learning and
Regularization in Linear Regression [35.78863301525758]
We study a localized notion of uniform convergence known as an "optimistic rate"
Our refined analysis avoids the hidden constant and logarithmic factor in existing results.
arXiv Detail & Related papers (2021-12-08T18:55:00Z) - Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order.
We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Relative Lipschitzness in Extragradient Methods and a Direct Recipe for
Acceleration [30.369542816161378]
We show that standard extragradient methods (i.e. mirror prox and dual extrapolation) recover optimal accelerated rates for first-order minimization of smooth convex functions.
We generalize this framework to handle local and randomized notions of relative Lipschitzness.
arXiv Detail & Related papers (2020-11-12T18:43:40Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z) - Exactly Computing the Local Lipschitz Constant of ReLU Networks [98.43114280459271]
The local Lipschitz constant of a neural network is a useful metric for robustness, generalization, and fairness evaluation.
We show strong inapproximability results for estimating Lipschitz constants of ReLU networks.
We leverage this algorithm to evaluate the tightness of competing Lipschitz estimators and the effects of regularized training on the Lipschitz constant.
arXiv Detail & Related papers (2020-03-02T22:15:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.