Bias-variance decomposition of overparameterized regression with random
linear features
- URL: http://arxiv.org/abs/2203.05443v1
- Date: Thu, 10 Mar 2022 16:09:21 GMT
- Title: Bias-variance decomposition of overparameterized regression with random
linear features
- Authors: Jason W. Rocks, Pankaj Mehta
- Abstract summary: "Over parameterized models" avoid overfitting even when the number of fit parameters is large enough to perfectly fit the training data.
We show how each transition arises due to small nonzero eigenvalues in the Hessian matrix.
We compare and contrast the phase diagram of the random linear features model to the random nonlinear features model and ordinary regression.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In classical statistics, the bias-variance trade-off describes how varying a
model's complexity (e.g., number of fit parameters) affects its ability to make
accurate predictions. According to this trade-off, optimal performance is
achieved when a model is expressive enough to capture trends in the data, yet
not so complex that it overfits idiosyncratic features of the training data.
Recently, it has become clear that this classic understanding of the
bias-variance must be fundamentally revisited in light of the incredible
predictive performance of "overparameterized models" -- models that avoid
overfitting even when the number of fit parameters is large enough to perfectly
fit the training data. Here, we present results for one of the simplest
examples of an overparameterized model: regression with random linear features
(i.e. a two-layer neural network with a linear activation function). Using the
zero-temperature cavity method, we derive analytic expressions for the training
error, test error, bias, and variance. We show that the linear random features
model exhibits three phase transitions: two different transitions to an
interpolation regime where the training error is zero, along with an additional
transition between regimes with large bias and minimal bias. Using random
matrix theory, we show how each transition arises due to small nonzero
eigenvalues in the Hessian matrix. Finally, we compare and contrast the phase
diagram of the random linear features model to the random nonlinear features
model and ordinary regression, highlighting the new phase transitions that
result from the use of linear basis functions.
Related papers
- Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Adaptive Optimization for Prediction with Missing Data [6.800113478497425]
We show that some adaptive linear regression models are equivalent to learning an imputation rule and a downstream linear regression model simultaneously.
In settings where data is strongly not missing at random, our methods achieve a 2-10% improvement in out-of-sample accuracy.
arXiv Detail & Related papers (2024-02-02T16:35:51Z) - Analysis of Interpolating Regression Models and the Double Descent
Phenomenon [3.883460584034765]
It is commonly assumed that models which interpolate noisy training data are poor to generalize.
The best models obtained are overparametrized and the testing error exhibits the double descent behavior as the model order increases.
We derive a result based on the behavior of the smallest singular value of the regression matrix that explains the peak location and the double descent shape of the testing error as a function of model order.
arXiv Detail & Related papers (2023-04-17T09:44:33Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z) - The Geometry of Over-parameterized Regression and Adversarial
Perturbations [0.0]
We present an alternative geometric interpretation of regression that applies to both under- and over- parameterized models.
We show that adversarial perturbations are a generic feature of biased models, arising from the underlying geometry.
arXiv Detail & Related papers (2021-03-25T19:52:08Z) - Memorizing without overfitting: Bias, variance, and interpolation in
over-parameterized models [0.0]
The bias-variance trade-off is a central concept in supervised learning.
Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance.
arXiv Detail & Related papers (2020-10-26T22:31:04Z) - Non-parametric Models for Non-negative Functions [48.7576911714538]
We provide the first model for non-negative functions from the same good linear models.
We prove that it admits a representer theorem and provide an efficient dual formulation for convex problems.
arXiv Detail & Related papers (2020-07-08T07:17:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.