Dimensionality reduction, regularization, and generalization in
overparameterized regressions
- URL: http://arxiv.org/abs/2011.11477v2
- Date: Wed, 20 Oct 2021 02:49:38 GMT
- Title: Dimensionality reduction, regularization, and generalization in
overparameterized regressions
- Authors: Ningyuan Huang and David W. Hogg and Soledad Villar
- Abstract summary: We show that PCA-OLS, also known as principal component regression, can be avoided with a dimensionality reduction.
We show that dimensionality reduction improves robustness while OLS is arbitrarily susceptible to adversarial attacks.
We find that methods in which the projection depends on the training data can outperform methods where the projections are chosen independently of the training data.
- Score: 8.615625517708324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overparameterization in deep learning is powerful: Very large models fit the
training data perfectly and yet often generalize well. This realization brought
back the study of linear models for regression, including ordinary least
squares (OLS), which, like deep learning, shows a "double-descent" behavior:
(1) The risk (expected out-of-sample prediction error) can grow arbitrarily
when the number of parameters $p$ approaches the number of samples $n$, and (2)
the risk decreases with $p$ for $p>n$, sometimes achieving a lower value than
the lowest risk for $p<n$. The divergence of the risk for OLS can be avoided
with regularization. In this work, we show that for some data models it can
also be avoided with a PCA-based dimensionality reduction (PCA-OLS, also known
as principal component regression). We provide non-asymptotic bounds for the
risk of PCA-OLS by considering the alignments of the population and empirical
principal components. We show that dimensionality reduction improves robustness
while OLS is arbitrarily susceptible to adversarial attacks, particularly in
the overparameterized regime. We compare PCA-OLS theoretically and empirically
with a wide range of projection-based methods, including random projections,
partial least squares (PLS), and certain classes of linear two-layer neural
networks. These comparisons are made for different data generation models to
assess the sensitivity to signal-to-noise and the alignment of regression
coefficients with the features. We find that methods in which the projection
depends on the training data can outperform methods where the projections are
chosen independently of the training data, even those with oracle knowledge of
population quantities, another seemingly paradoxical phenomenon that has been
identified previously. This suggests that overparameterization may not be
necessary for good generalization.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression [29.57766164934947]
We investigate popular resampling methods for estimating the uncertainty of statistical models.
We provide a tight description of the biases and variances estimated by these methods in the context of generalized linear models.
arXiv Detail & Related papers (2024-02-21T08:50:33Z) - Handling Overlapping Asymmetric Datasets -- A Twice Penalized P-Spline
Approach [0.40964539027092917]
This research aims to develop a new method which can model the smaller cohort against a particular response.
We find our twice penalized approach offers an enhanced fit over a linear B-Spline and once penalized P-Spline approximation.
Applying to a real-life dataset relating to a person's risk of developing Non-Alcoholic Steatohepatitis, we see an improved model fit performance of over 65%.
arXiv Detail & Related papers (2023-11-17T12:41:07Z) - Transfer Learning with Random Coefficient Ridge Regression [2.0813318162800707]
Ridge regression with random coefficients provides an important alternative to fixed coefficients regression in high dimensional setting.
This paper considers estimation and prediction of random coefficient ridge regression in the setting of transfer learning.
arXiv Detail & Related papers (2023-06-28T04:36:37Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Mitigating multiple descents: A model-agnostic framework for risk
monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation.
We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z) - Benign-Overfitting in Conditional Average Treatment Effect Prediction
with Linear Regression [14.493176427999028]
We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE) with linear regression models.
We show that the T-learner fails to achieve the consistency except the random assignment, while the IPW-learner converges the risk to zero if the propensity score is known.
arXiv Detail & Related papers (2022-02-10T18:51:52Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Interpolating Predictors in High-Dimensional Factor Regression [2.1055643409860743]
This work studies finite-sample properties of the risk of the minimum-norm interpolating predictor in high-dimensional regression models.
We show that the min-norm interpolating predictor can have similar risk to predictors based on principal components regression and ridge regression, and can improve over LASSO based predictors, in the high-dimensional regime.
arXiv Detail & Related papers (2020-02-06T22:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.