Overfitting Can Be Harmless for Basis Pursuit, But Only to a Degree
- URL: http://arxiv.org/abs/2002.00492v2
- Date: Tue, 17 Nov 2020 23:29:36 GMT
- Title: Overfitting Can Be Harmless for Basis Pursuit, But Only to a Degree
- Authors: Peizhong Ju, Xiaojun Lin, Jia Liu
- Abstract summary: We study the overfitting solution that minimizes the $ell$-norm, known as Basis Pursuit (BP) in the compressed sensing literature.
We show that for a large range of $p$ up to a limit that grows exponentially with the number of samples $n$, with high probability the model error of BP is upper bounded by a value that decreases with $p$.
- Score: 20.408693108187542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, there have been significant interests in studying the so-called
"double-descent" of the generalization error of linear regression models under
the overparameterized and overfitting regime, with the hope that such analysis
may provide the first step towards understanding why overparameterized deep
neural networks (DNN) still generalize well. However, to date most of these
studies focused on the min $\ell_2$-norm solution that overfits the data. In
contrast, in this paper we study the overfitting solution that minimizes the
$\ell_1$-norm, which is known as Basis Pursuit (BP) in the compressed sensing
literature. Under a sparse true linear regression model with $p$ i.i.d.
Gaussian features, we show that for a large range of $p$ up to a limit that
grows exponentially with the number of samples $n$, with high probability the
model error of BP is upper bounded by a value that decreases with $p$. To the
best of our knowledge, this is the first analytical result in the literature
establishing the double-descent of overfitting BP for finite $n$ and $p$.
Further, our results reveal significant differences between the double-descent
of BP and min $\ell_2$-norm solutions. Specifically, the double-descent
upper-bound of BP is independent of the signal strength, and for high SNR and
sparse models the descent-floor of BP can be much lower and wider than that of
min $\ell_2$-norm solutions.
Related papers
- Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Asymptotics of Random Feature Regression Beyond the Linear Scaling
Regime [22.666759017118796]
Recent advances in machine learning have been achieved by using overparametrized models trained until near the training data.
How does model complexity and generalization depend on the number of parameters $p$?
In particular, RFRR exhibits an intuitive trade-off between approximation and generalization power.
arXiv Detail & Related papers (2024-03-13T00:59:25Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates
and Model Misspecification [16.0817847880416]
We offer a unified framework that includes a broad variety of linear prediction problems on a Hilbert space.
We show that for misspecification level $epsilon$, these estimators achieve an error rate of $O(maxleft|mathcalO|1/2n-1/2, |mathcalI|1/2n-1 right+epsilon)$, matching the best-known rates in literature.
arXiv Detail & Related papers (2022-01-06T08:51:08Z) - Consistent Estimation for PCA and Sparse Regression with Oblivious
Outliers [13.244654316770815]
We develop machinery to design efficiently computable and consistent estimators.
For sparse regression, we achieve consistency for optimal sample size $ngsim (klog d)/alpha2$.
In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix.
arXiv Detail & Related papers (2021-11-04T15:59:44Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - An Exponential Lower Bound for Linearly-Realizable MDPs with Constant
Suboptimality Gap [66.75488143823337]
We show that an exponential sample complexity lower bound still holds even if a constant suboptimality gap is assumed.
Perhaps surprisingly, this implies an exponential separation between the online RL setting and the generative model setting.
arXiv Detail & Related papers (2021-03-23T17:05:54Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.