Related papers: Test Set Sizing for the Ridge Regression

Test Set Sizing for the Ridge Regression

URL: http://arxiv.org/abs/2504.19231v1
Date: Sun, 27 Apr 2025 13:17:18 GMT
Title: Test Set Sizing for the Ridge Regression
Authors: Alexander Dubbs,
Abstract summary: This is the first time that such a split is calculated mathematically for a machine learning model in the large data limit.<n>The goal of the calculations is to maximize "integrity," so that the measured error in the trained model is as close as possible to what it theoretically should be.
Score: 55.2480439325792
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We derive the ideal train/test split for the ridge regression to high accuracy in the limit that the number of training rows m becomes large. The split must depend on the ridge tuning parameter, alpha, but we find that the dependence is weak and can asymptotically be ignored; all parameters vanish except for m and the number of features, n. This is the first time that such a split is calculated mathematically for a machine learning model in the large data limit. The goal of the calculations is to maximize "integrity," so that the measured error in the trained model is as close as possible to what it theoretically should be. This paper's result for the ridge regression split matches prior art for the plain vanilla linear regression split to the first two terms asymptotically, and it appears that practically there is no difference.

Related papers

Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup. We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$. Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z)
Regularization properties of adversarially-trained linear regression [5.7077257711082785]
State-of-the-art machine learning models can be vulnerable to very small input perturbations. Adversarial training is an effective approach to defend against it.
arXiv Detail & Related papers (2023-10-16T20:09:58Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Dimension free ridge regression [10.434481202633458]
We revisit ridge regression on i.i.d. data in terms of the bias and variance of ridge regression in terms of the bias and variance of an equivalent' sequence model. As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum.
arXiv Detail & Related papers (2022-10-16T16:01:05Z)
Surprises in adversarially-trained linear regression [12.33259114006129]
Adversarial training is one of the most effective approaches to defend against such examples. We show that for linear regression problems, adversarial training can be formulated as a convex problem. We show that for sufficiently many features or sufficiently small regularization parameters, the learned model perfectly interpolates the training data.
arXiv Detail & Related papers (2022-05-25T11:54:42Z)
Benign-Overfitting in Conditional Average Treatment Effect Prediction with Linear Regression [14.493176427999028]
We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE) with linear regression models. We show that the T-learner fails to achieve the consistency except the random assignment, while the IPW-learner converges the risk to zero if the propensity score is known.
arXiv Detail & Related papers (2022-02-10T18:51:52Z)
Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise. This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z)
Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z)
Benign overfitting in ridge regression [0.0]
We provide non-asymptotic generalization bounds for overparametrized ridge regression. We identify when small or negative regularization is sufficient for obtaining small generalization error.
arXiv Detail & Related papers (2020-09-29T20:00:31Z)
Additive interaction modelling using I-priors [0.571097144710995]
We introduce a parsimonious specification of models with interactions, which has two benefits. It reduces the number of scale parameters and thus facilitates the estimation of models with interactions.
arXiv Detail & Related papers (2020-07-30T22:52:22Z)
Minimax Semiparametric Learning With Approximate Sparsity [3.5136198842746524]
This paper formalizes the concept of approximate model sparsity through classical semi-parametric theory.<n>We derive minimax rates for a regression slope and an average derivative, finding these bounds to be substantially larger than those in low-dimensional, semi-parametric settings.
arXiv Detail & Related papers (2019-12-27T16:13:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.