Dimension-free deterministic equivalents and scaling laws for random feature regression
- URL: http://arxiv.org/abs/2405.15699v3
- Date: Tue, 05 Nov 2024 15:19:55 GMT
- Title: Dimension-free deterministic equivalents and scaling laws for random feature regression
- Authors: Leonardo Defilippis, Bruno Loureiro, Theodor Misiakiewicz,
- Abstract summary: We show that the test error is well approximated by a closed-form expression that only depends on the feature map eigenvalues.
Notably, our approximation guarantee is non-asymptotic, multiplicative, and independent of the feature map dimension.
- Score: 11.607594737176973
- License:
- Abstract: In this work we investigate the generalization performance of random feature ridge regression (RFRR). Our main contribution is a general deterministic equivalent for the test error of RFRR. Specifically, under a certain concentration property, we show that the test error is well approximated by a closed-form expression that only depends on the feature map eigenvalues. Notably, our approximation guarantee is non-asymptotic, multiplicative, and independent of the feature map dimension -- allowing for infinite-dimensional features. We expect this deterministic equivalent to hold broadly beyond our theoretical analysis, and we empirically validate its predictions on various real and synthetic datasets. As an application, we derive sharp excess error rates under standard power-law assumptions of the spectrum and target decay. In particular, we provide a tight result for the smallest number of features achieving optimal minimax error rate.
Related papers
- A non-asymptotic theory of Kernel Ridge Regression: deterministic equivalents, test error, and GCV estimator [7.163829696591103]
We consider learning an unknown target function $f_*$ using kernel ridge regression.
We establish a non-asymptotic deterministic approximation for the test error of KRR.
arXiv Detail & Related papers (2024-03-13T20:12:03Z) - Error Bounds for Learning with Vector-Valued Random Features [2.375038919274297]
This paper provides a comprehensive error analysis of learning with vector-valued random features (RF)
The theory is developed for RF ridge regression in a fully general infinite-dimensional input-output setting.
arXiv Detail & Related papers (2023-05-26T18:00:08Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Off-policy estimation of linear functionals: Non-asymptotic theory for
semi-parametric efficiency [59.48096489854697]
The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures.
We prove non-asymptotic upper bounds on the mean-squared error of such procedures.
We establish its instance-dependent optimality in finite samples via matching non-asymptotic local minimax lower bounds.
arXiv Detail & Related papers (2022-09-26T23:50:55Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Adversarial Estimation of Riesz Representers [21.510036777607397]
We propose an adversarial framework to estimate the Riesz representer using general function spaces.
We prove a nonasymptotic mean square rate in terms of an abstract quantity called the critical radius, then specialize it for neural networks, random forests, and reproducing kernel Hilbert spaces as leading cases.
arXiv Detail & Related papers (2020-12-30T19:46:57Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in
High Dimensions [41.7567932118769]
Empirical Risk Minimization algorithms are widely used in a variety of estimation and prediction tasks.
In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference.
arXiv Detail & Related papers (2020-06-16T04:27:38Z) - Minimax Estimation of Conditional Moment Models [40.95498063465325]
We introduce a min-max criterion function, under which the estimation problem can be thought of as solving a zero-sum game.
We analyze the statistical estimation rate of the resulting estimator for arbitrary hypothesis spaces.
We show how our modified mean squared error rate, combined with conditions that bound the ill-posedness of the inverse problem, lead to mean squared error rates.
arXiv Detail & Related papers (2020-06-12T14:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.