Minimum $\ell_{1}$-norm interpolators: Precise asymptotics and multiple
descent
- URL: http://arxiv.org/abs/2110.09502v1
- Date: Mon, 18 Oct 2021 17:51:14 GMT
- Title: Minimum $\ell_{1}$-norm interpolators: Precise asymptotics and multiple
descent
- Authors: Yue Li, Yuting Wei
- Abstract summary: This paper pursues theoretical understanding for an important type of interpolators: the minimum $ell_1$-norm interpolator.
We observe, and provide rigorous theoretical justification for, a curious multi-descent phenomenon.
Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two non-linear equations with two unknowns.
- Score: 19.781475554462553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An evolving line of machine learning works observe empirical evidence that
suggests interpolating estimators -- the ones that achieve zero training error
-- may not necessarily be harmful. This paper pursues theoretical understanding
for an important type of interpolators: the minimum $\ell_{1}$-norm
interpolator, which is motivated by the observation that several learning
algorithms favor low $\ell_1$-norm solutions in the over-parameterized regime.
Concretely, we consider the noisy sparse regression model under Gaussian
design, focusing on linear sparsity and high-dimensional asymptotics (so that
both the number of features and the sparsity level scale proportionally with
the sample size).
We observe, and provide rigorous theoretical justification for, a curious
multi-descent phenomenon; that is, the generalization risk of the minimum
$\ell_1$-norm interpolator undergoes multiple (and possibly more than two)
phases of descent and ascent as one increases the model capacity. This
phenomenon stems from the special structure of the minimum $\ell_1$-norm
interpolator as well as the delicate interplay between the over-parameterized
ratio and the sparsity, thus unveiling a fundamental distinction in geometry
from the minimum $\ell_2$-norm interpolator. Our finding is built upon an exact
characterization of the risk behavior, which is governed by a system of two
non-linear equations with two unknowns.
Related papers
- Understanding the Double Descent Phenomenon in Deep Learning [49.1574468325115]
This tutorial sets the classical statistical learning framework and introduces the double descent phenomenon.
By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting.
section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
arXiv Detail & Related papers (2024-03-15T16:51:24Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Deep Linear Networks can Benignly Overfit when Shallow Ones Do [16.1176305285103]
We show that randomly deep linear networks can closely approximate or even match known bounds for the minimum $ell$-norm interpolant.
Our analysis also reveals that interpolating deep linear models have exactly the same conditional variance as the minimum $ell$-norm solution.
arXiv Detail & Related papers (2022-09-19T19:23:04Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Minimax Supervised Clustering in the Anisotropic Gaussian Mixture Model:
A new take on Robust Interpolation [5.98367009147573]
We study the supervised clustering problem under the two-component anisotropic Gaussian mixture model.
We show that in the high-dimensional regime, the linear discriminant analysis (LDA) classifier turns out to be sub-optimal in the minimax sense.
arXiv Detail & Related papers (2021-11-13T05:19:37Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - On the robustness of minimum-norm interpolators [0.0]
This article develops a general theory for minimum-norm interpolated estimators in linear models in the presence of additive, potentially adversarial, errors.
A quantitative bound for the prediction error is given, relating it to the Rademacher norm of the minimum norm interpolator of the errors and the shape of the subdifferential around the true parameter.
arXiv Detail & Related papers (2020-12-01T20:03:20Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.