A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized
Linear Models
- URL: http://arxiv.org/abs/2210.12082v1
- Date: Fri, 21 Oct 2022 16:16:55 GMT
- Title: A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized
Linear Models
- Authors: Lijia Zhou and Frederic Koehler and Pragya Sur and Danica J.
Sutherland and Nathan Srebro
- Abstract summary: We prove a new generalization bound that shows for any class of linear predictors in Gaussian space.
We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021)
We show that application of our bound generalization using localized Gaussian width will generally be sharp for empirical risk minimizers.
- Score: 33.36787620121057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove a new generalization bound that shows for any class of linear
predictors in Gaussian space, the Rademacher complexity of the class and the
training error under any continuous loss $\ell$ can control the test error
under all Moreau envelopes of the loss $\ell$. We use our finite-sample bound
to directly recover the "optimistic rate" of Zhou et al. (2021) for linear
regression with the square loss, which is known to be tight for minimal
$\ell_2$-norm interpolation, but we also handle more general settings where the
label is generated by a potentially misspecified multi-index model. The same
argument can analyze noisy interpolation of max-margin classifiers through the
squared hinge loss, and establishes consistency results in spiked-covariance
settings. More generally, when the loss is only assumed to be Lipschitz, our
bound effectively improves Talagrand's well-known contraction lemma by a factor
of two, and we prove uniform convergence of interpolators (Koehler et al. 2021)
for all smooth, non-negative losses. Finally, we show that application of our
generalization bound using localized Gaussian width will generally be sharp for
empirical risk minimizers, establishing a non-asymptotic Moreau envelope theory
for generalization that applies outside of proportional scaling regimes,
handles model misspecification, and complements existing asymptotic Moreau
envelope theories for M-estimation.
Related papers
- Precise Asymptotic Generalization for Multiclass Classification with
Overparameterized Linear Models [4.093769373833101]
We resolve the conjecture posed in Subramanian et al.'22, where the number of data points, features, and classes all grow together.
Our new lower bounds are akin to an information-theoretic strong converse: they establish that the misclassification rate goes to 0 or 1ally.
The key to our tight analysis is a new variant of the Hanson-Wright inequality which is broadly useful for multiclass problems with sparse labels.
arXiv Detail & Related papers (2023-06-23T00:59:15Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Fine-grained analysis of non-parametric estimation for pairwise learning [9.676007573960383]
We are concerned with the generalization performance of non-parametric estimation for pairwise learning.
Our results can be used to handle a wide range of pairwise learning problems including ranking, AUC, pairwise regression and metric and similarity learning.
arXiv Detail & Related papers (2023-05-31T08:13:14Z) - General Loss Functions Lead to (Approximate) Interpolation in High
Dimensions [6.738946307589741]
We provide a unified framework to approximately characterize the implicit bias of gradient descent in closed form.
Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm in high dimensions.
Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings.
arXiv Detail & Related papers (2023-03-13T21:23:12Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - Double Descent in Random Feature Models: Precise Asymptotic Analysis for
General Convex Regularization [4.8900735721275055]
We provide precise expressions for the generalization of regression under a broad class of convex regularization terms.
We numerically demonstrate the predictive capacity of our framework, and show experimentally that the predicted test error is accurate even in the non-asymptotic regime.
arXiv Detail & Related papers (2022-04-06T08:59:38Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and
Benign Overfitting [35.78863301525758]
We prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class.
Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. ( 2020) for minimum-norm interpolators.
We show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings.
arXiv Detail & Related papers (2021-06-17T06:58:10Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.