Least Squares Regression Can Exhibit Under-Parameterized Double Descent
- URL: http://arxiv.org/abs/2305.14689v3
- Date: Thu, 24 Oct 2024 20:32:20 GMT
- Title: Least Squares Regression Can Exhibit Under-Parameterized Double Descent
- Authors: Xinyue Li, Rishi Sonthalia,
- Abstract summary: We study the relationship between the number of training data points, the number of parameters, and the generalization capabilities of models.
We postulate that the location of the peak depends on the technical properties of both the spectrum as well as the eigenvectors of the sample covariance.
- Score: 6.645111950779666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The relationship between the number of training data points, the number of parameters, and the generalization capabilities of models has been widely studied. Previous work has shown that double descent can occur in the over-parameterized regime and that the standard bias-variance trade-off holds in the under-parameterized regime. These works provide multiple reasons for the existence of the peak. We postulate that the location of the peak depends on the technical properties of both the spectrum as well as the eigenvectors of the sample covariance. We present two simple examples that provably exhibit double descent in the under-parameterized regime and do not seem to occur for reasons provided in prior work.
Related papers
- A Sparsity Principle for Partially Observable Causal Representation Learning [28.25303444099773]
Causal representation learning aims at identifying high-level causal variables from perceptual data.
We focus on learning from unpaired observations from a dataset with an instance-dependent partial observability pattern.
We propose two methods for estimating the underlying causal variables by enforcing sparsity in the inferred representation.
arXiv Detail & Related papers (2024-03-13T08:40:49Z) - Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse
Actions, Interventions and Sparse Temporal Dependencies [58.179981892921056]
This work introduces a novel principle for disentanglement we call mechanism sparsity regularization.
We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors.
We show that the latent factors can be recovered by regularizing the learned causal graph to be sparse.
arXiv Detail & Related papers (2024-01-10T02:38:21Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - A Unified Analysis of Multi-task Functional Linear Regression Models
with Manifold Constraint and Composite Quadratic Penalty [0.0]
The power of multi-task learning is brought in by imposing additional structures over the slope functions.
We show the composite penalty induces a specific norm, which helps to quantify the manifold curvature.
A unified convergence upper bound is obtained and specifically applied to the reduced-rank model and the graph Laplacian regularized model.
arXiv Detail & Related papers (2022-11-09T13:32:23Z) - Time varying regression with hidden linear dynamics [74.9914602730208]
We revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system.
Counterintuitively, we show that when the underlying dynamics are stable the parameters of this model can be estimated from data by combining just two ordinary least squares estimates.
arXiv Detail & Related papers (2021-12-29T23:37:06Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Doubly Robust Semiparametric Difference-in-Differences Estimators with
High-Dimensional Data [15.27393561231633]
We propose a doubly robust two-stage semiparametric difference-in-difference estimator for estimating heterogeneous treatment effects.
The first stage allows a general set of machine learning methods to be used to estimate the propensity score.
In the second stage, we derive the rates of convergence for both the parametric parameter and the unknown function.
arXiv Detail & Related papers (2020-09-07T15:14:29Z) - Multiple Descent: Design Your Own Generalization Curve [46.47831396167738]
We show that the generalization curve can have an arbitrary number of peaks, and moreover, locations of those peaks can be explicitly controlled.
Our results highlight the fact that both classical U-shaped generalization curve and the recently observed double descent curve are not intrinsic properties of the model family.
arXiv Detail & Related papers (2020-08-03T17:22:21Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.