Asymptotics of Ridge (less) Regression under General Source Condition
- URL: http://arxiv.org/abs/2006.06386v3
- Date: Mon, 8 Mar 2021 10:01:35 GMT
- Title: Asymptotics of Ridge (less) Regression under General Source Condition
- Authors: Dominic Richards, Jaouad Mourtada and Lorenzo Rosasco
- Abstract summary: We consider the role played by the structure of the true regression parameter.
We show that (no regularisation) can be optimal even with bounded signal-to-noise ratio (SNR)
This contrasts with previous work considering ridge regression with isotropic prior, in which case is only optimal in the limit of infinite SNR.
- Score: 26.618200633139256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze the prediction error of ridge regression in an asymptotic regime
where the sample size and dimension go to infinity at a proportional rate. In
particular, we consider the role played by the structure of the true regression
parameter. We observe that the case of a general deterministic parameter can be
reduced to the case of a random parameter from a structured prior. The latter
assumption is a natural adaptation of classic smoothness assumptions in
nonparametric regression, which are known as source conditions in the the
context of regularization theory for inverse problems. Roughly speaking, we
assume the large coefficients of the parameter are in correspondence to the
principal components. In this setting a precise characterisation of the test
error is obtained, depending on the inputs covariance and regression parameter
structure. We illustrate this characterisation in a simplified setting to
investigate the influence of the true parameter on optimal regularisation for
overparameterized models. We show that interpolation (no regularisation) can be
optimal even with bounded signal-to-noise ratio (SNR), provided that the
parameter coefficients are larger on high-variance directions of the data,
corresponding to a more regular function than posited by the regularization
term. This contrasts with previous work considering ridge regression with
isotropic prior, in which case interpolation is only optimal in the limit of
infinite SNR.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Overparameterized Multiple Linear Regression as Hyper-Curve Fitting [0.0]
It is proven that a linear model will produce exact predictions even in the presence of nonlinear dependencies that violate the model assumptions.
The hyper-curve approach is especially suited for the regularization of problems with noise in predictor variables and can be used to remove noisy and "improper" predictors from the model.
arXiv Detail & Related papers (2024-04-11T15:43:11Z) - Gradient-based bilevel optimization for multi-penalty Ridge regression
through matrix differential calculus [0.46040036610482665]
We introduce a gradient-based approach to the problem of linear regression with l2-regularization.
We show that our approach outperforms LASSO, Ridge, and Elastic Net regression.
The analytical of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation.
arXiv Detail & Related papers (2023-11-23T20:03:51Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Support estimation in high-dimensional heteroscedastic mean regression [2.28438857884398]
We consider a linear mean regression model with random design and potentially heteroscedastic, heavy-tailed errors.
We use a strictly convex, smooth variant of the Huber loss function with tuning parameter depending on the parameters of the problem.
For the resulting estimator we show sign-consistency and optimal rates of convergence in the $ell_infty$ norm.
arXiv Detail & Related papers (2020-11-03T09:46:31Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in
High Dimensions [41.7567932118769]
Empirical Risk Minimization algorithms are widely used in a variety of estimation and prediction tasks.
In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference.
arXiv Detail & Related papers (2020-06-16T04:27:38Z) - On Low-rank Trace Regression under General Sampling Distribution [9.699586426043885]
We show that cross-validated estimators satisfy near-optimal error bounds on general assumptions.
We also show that the cross-validated estimator outperforms the theory-inspired approach of selecting the parameter.
arXiv Detail & Related papers (2019-04-18T02:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.