Related papers: Uniform Convergence with Square-Root Lipschitz Loss

Uniform Convergence with Square-Root Lipschitz Loss

URL: http://arxiv.org/abs/2306.13188v1
Date: Thu, 22 Jun 2023 20:14:08 GMT
Title: Uniform Convergence with Square-Root Lipschitz Loss
Authors: Lijia Zhou, Zhen Dai, Frederic Koehler, Nathan Srebro
Abstract summary: We show how these guarantees substantially generalize previous results based on smoothness (Lipschitz constant of the derivative) We better understand "optimistic rate" and "learning guarantees"
Score: 35.52581372892922
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We establish generic uniform convergence guarantees for Gaussian data in terms of the Rademacher complexity of the hypothesis class and the Lipschitz constant of the square root of the scalar loss function. We show how these guarantees substantially generalize previous results based on smoothness (Lipschitz constant of the derivative), and allow us to handle the broader class of square-root-Lipschitz losses, which includes also non-smooth loss functions appropriate for studying phase retrieval and ReLU regression, as well as rederive and better understand "optimistic rate" and interpolation learning guarantees.

Related papers

Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness [50.78508362183774]
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance.<n>Most require the Lipschitz condition, which is often not met in common machine learning schemes.
arXiv Detail & Related papers (2025-07-11T15:36:48Z)
Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity [5.866816093792934]
We show that a wide class of continuity algorithms, including the subgradient method, preserve the $mathO convergence rate with constant failure rate. Our analyses rest on rather weak assumptions: the Lipschitz parameter can be either bounded by a general growth function of $|x|$ or locally estimated through independent random samples.
arXiv Detail & Related papers (2024-01-25T06:06:31Z)
Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization [59.038366742773164]
Linears and leaky ReLU trained by gradient flow on logistic loss have an implicit bias towards satisfying the Karush-KuTucker (KKT) conditions. In this work we establish a number of settings where the satisfaction of these conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks.
arXiv Detail & Related papers (2023-03-02T18:24:26Z)
Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space. As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z)
Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression [35.78863301525758]
We study a localized notion of uniform convergence known as an "optimistic rate" Our refined analysis avoids the hidden constant and logarithmic factor in existing results.
arXiv Detail & Related papers (2021-12-08T18:55:00Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
Relative Lipschitzness in Extragradient Methods and a Direct Recipe for Acceleration [30.369542816161378]
We show that standard extragradient methods (i.e. mirror prox and dual extrapolation) recover optimal accelerated rates for first-order minimization of smooth convex functions. We generalize this framework to handle local and randomized notions of relative Lipschitzness.
arXiv Detail & Related papers (2020-11-12T18:43:40Z)
Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression. The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z)
Exactly Computing the Local Lipschitz Constant of ReLU Networks [98.43114280459271]
The local Lipschitz constant of a neural network is a useful metric for robustness, generalization, and fairness evaluation. We show strong inapproximability results for estimating Lipschitz constants of ReLU networks. We leverage this algorithm to evaluate the tightness of competing Lipschitz estimators and the effects of regularized training on the Lipschitz constant.
arXiv Detail & Related papers (2020-03-02T22:15:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.