Related papers: Analysis of Interpolating Regression Models and the Double Descent Phenomenon

Analysis of Interpolating Regression Models and the Double Descent Phenomenon

URL: http://arxiv.org/abs/2304.08113v1
Date: Mon, 17 Apr 2023 09:44:33 GMT
Title: Analysis of Interpolating Regression Models and the Double Descent Phenomenon
Authors: Tomas McKelvey
Abstract summary: It is commonly assumed that models which interpolate noisy training data are poor to generalize. The best models obtained are overparametrized and the testing error exhibits the double descent behavior as the model order increases. We derive a result based on the behavior of the smallest singular value of the regression matrix that explains the peak location and the double descent shape of the testing error as a function of model order.
Score: 3.883460584034765
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A regression model with more parameters than data points in the training data is overparametrized and has the capability to interpolate the training data. Based on the classical bias-variance tradeoff expressions, it is commonly assumed that models which interpolate noisy training data are poor to generalize. In some cases, this is not true. The best models obtained are overparametrized and the testing error exhibits the double descent behavior as the model order increases. In this contribution, we provide some analysis to explain the double descent phenomenon, first reported in the machine learning literature. We focus on interpolating models derived from the minimum norm solution to the classical least-squares problem and also briefly discuss model fitting using ridge regression. We derive a result based on the behavior of the smallest singular value of the regression matrix that explains the peak location and the double descent shape of the testing error as a function of model order.

Related papers

Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression [4.71547360356314]
Model collapse occurs when generative models degrade after repeatedly training on their own synthetic outputs.<n>We derive precise error formulae for minimum-$ell$-norm and ridge regression under this iterative scheme.<n>Our analysis reveals intriguing properties of the mixing weight that minimizes long-term prediction error.
arXiv Detail & Related papers (2025-09-26T13:34:48Z)
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data [49.73114504515852]
We show that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse. We demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse.
arXiv Detail & Related papers (2024-04-01T18:31:24Z)
Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle [12.00962791565144]
Double descent is a surprising phenomenon in machine learning. As the number of model parameters grows relative to the number of data, test error drops.
arXiv Detail & Related papers (2023-03-24T17:03:40Z)
Bias-variance decomposition of overparameterized regression with random linear features [0.0]
"Over parameterized models" avoid overfitting even when the number of fit parameters is large enough to perfectly fit the training data. We show how each transition arises due to small nonzero eigenvalues in the Hessian matrix. We compare and contrast the phase diagram of the random linear features model to the random nonlinear features model and ordinary regression.
arXiv Detail & Related papers (2022-03-10T16:09:21Z)
X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning. To take the power of both worlds, we propose a novel X-model. X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z)
Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates. We formulate the regression-free model updates into a constrained optimization problem. We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z)
Asymptotics of Ridge Regression in Convolutional Models [26.910291664252973]
We derive exact formulae for estimation error of ridge estimators that hold in a certain high-dimensional regime. We show the double descent phenomenon in our experiments for convolutional models and show that our theoretical results match the experiments.
arXiv Detail & Related papers (2021-03-08T05:56:43Z)
Positive-Congruent Training: Towards Regression-Free Model Updates [87.25247195148187]
In image classification, sample-wise inconsistencies appear as "negative flips" A new model incorrectly predicts the output for a test sample that was correctly classified by the old (reference) model. We propose a simple approach for PC training, Focal Distillation, which enforces congruence with the reference model.
arXiv Detail & Related papers (2020-11-18T09:00:44Z)
Memorizing without overfitting: Bias, variance, and interpolation in over-parameterized models [0.0]
The bias-variance trade-off is a central concept in supervised learning. Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance.
arXiv Detail & Related papers (2020-10-26T22:31:04Z)
Variational Bayesian Unlearning [54.26984662139516]
We study the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief. In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging.
arXiv Detail & Related papers (2020-10-24T11:53:00Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Dimension Independent Generalization Error by Stochastic Gradient Descent [12.474236773219067]
We present a theory on the generalization error of descent (SGD) solutions for both and locally convex loss functions. We show that the generalization error does not depend on the $p$ dimension or depends on the low effective $p$logarithmic factor.
arXiv Detail & Related papers (2020-03-25T03:08:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.