Algebraic and Statistical Properties of the Ordinary Least Squares Interpolator
- URL: http://arxiv.org/abs/2309.15769v2
- Date: Thu, 30 May 2024 13:43:44 GMT
- Title: Algebraic and Statistical Properties of the Ordinary Least Squares Interpolator
- Authors: Dennis Shen, Dogyoon Song, Peng Ding, Jasjeet S. Sekhon,
- Abstract summary: We provide results for the minimum $ell$-norm OLS interpolator.
We present statistical results such as an extension of the Gauss-Markov theorem.
We conduct simulations that further explore the properties of the OLS interpolator.
- Score: 3.4320157633663064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning research has uncovered the phenomenon of benign overfitting for overparameterized statistical models, which has drawn significant theoretical interest in recent years. Given its simplicity and practicality, the ordinary least squares (OLS) interpolator has become essential to gain foundational insights into this phenomenon. While properties of OLS are well established in classical, underparameterized settings, its behavior in high-dimensional, overparameterized regimes is less explored (unlike for ridge or lasso regression) though significant progress has been made of late. We contribute to this growing literature by providing fundamental algebraic and statistical results for the minimum $\ell_2$-norm OLS interpolator. In particular, we provide algebraic equivalents of (i) the leave-$k$-out residual formula, (ii) Cochran's formula, and (iii) the Frisch-Waugh-Lovell theorem in the overparameterized regime. These results aid in understanding the OLS interpolator's ability to generalize and have substantive implications for causal inference. Under the Gauss-Markov model, we present statistical results such as an extension of the Gauss-Markov theorem and an analysis of variance estimation under homoskedastic errors for the overparameterized regime. To substantiate our theoretical contributions, we conduct simulations that further explore the stochastic properties of the OLS interpolator.
Related papers
- Method-of-Moments Inference for GLMs and Doubly Robust Functionals under Proportional Asymptotics [30.324051162373973]
We consider the estimation of regression coefficients and signal-to-noise ratio in high-dimensional Generalized Linear Models (GLMs)
We derive Consistent and Asymptotically Normal (CAN) estimators of our targets of inference.
We complement our theoretical results with numerical experiments and comparisons with existing literature.
arXiv Detail & Related papers (2024-08-12T12:43:30Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Towards Convergence Rates for Parameter Estimation in Gaussian-gated
Mixture of Experts [40.24720443257405]
We provide a convergence analysis for maximum likelihood estimation (MLE) in the Gaussian-gated MoE model.
Our findings reveal that the MLE has distinct behaviors under two complement settings of location parameters of the Gaussian gating functions.
Notably, these behaviors can be characterized by the solvability of two different systems of equations.
arXiv Detail & Related papers (2023-05-12T16:02:19Z) - Adaptive LASSO estimation for functional hidden dynamic geostatistical
model [69.10717733870575]
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hiddenstatistical models (f-HD)
The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (GMSOLAS) penalty function, wherein the weights are obtained by the unpenalised f-HD maximum-likelihood estimators.
arXiv Detail & Related papers (2022-08-10T19:17:45Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of
Overparameterized Machine Learning [37.01683478234978]
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field.
One of the most important riddles is the good empirical generalization of over parameterized models.
arXiv Detail & Related papers (2021-09-06T10:48:40Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts [2.794896499906838]
We consider the class of softmax-gated Gaussian MoE (SGMoE) models with softmax gating functions and Gaussian experts.
To the best of our knowledge, we are the first to investigate the $l_1$-regularization properties of SGMoE models from a non-asymptotic perspective.
We provide a lower bound on the regularization parameter of the Lasso penalty that ensures non-asymptotic theoretical control of the Kullback--Leibler loss of the Lasso estimator for SGMoE models.
arXiv Detail & Related papers (2020-09-22T15:23:35Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Learning CHARME models with neural networks [1.5362025549031046]
We consider a model called CHARME (Conditional Heteroscedastic Autoregressive Mixture of Experts)
As an application, we develop a learning theory for the NN-based autoregressive functions of the model.
arXiv Detail & Related papers (2020-02-08T21:51:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.