A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers
- URL: http://arxiv.org/abs/2002.01586v3
- Date: Thu, 22 Jul 2021 20:55:22 GMT
- Title: A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers
- Authors: Tengyuan Liang and Pragya Sur
- Abstract summary: This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
- Score: 3.167685495996986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper establishes a precise high-dimensional asymptotic theory for
boosting on separable data, taking statistical and computational perspectives.
We consider a high-dimensional setting where the number of features (weak
learners) $p$ scales with the sample size $n$, in an overparametrized regime.
Under a class of statistical models, we provide an exact analysis of the
generalization error of boosting when the algorithm interpolates the training
data and maximizes the empirical $\ell_1$-margin. Further, we explicitly pin
down the relation between the boosting test error and the optimal Bayes error,
as well as the proportion of active features at interpolation (with zero
initialization). In turn, these precise characterizations answer certain
questions raised in \cite{breiman1999prediction, schapire1998boosting}
surrounding boosting, under assumed data generating processes. At the heart of
our theory lies an in-depth study of the maximum-$\ell_1$-margin, which can be
accurately described by a new system of non-linear equations; to analyze this
margin, we rely on Gaussian comparison techniques and develop a novel uniform
deviation argument. Our statistical and computational arguments can handle (1)
any finite-rank spiked covariance model for the feature distribution and (2)
variants of boosting corresponding to general $\ell_q$-geometry, $q \in [1,
2]$. As a final component, via the Lindeberg principle, we establish a
universality result showcasing that the scaled $\ell_1$-margin (asymptotically)
remains the same, whether the covariates used for boosting arise from a
non-linear random feature model or an appropriately linearized model with
matching moments.
Related papers
- Statistical Inference in Classification of High-dimensional Gaussian Mixture [1.2354076490479515]
We investigate the behavior of a general class of regularized convex classifiers in the high-dimensional limit.
Our focus is on the generalization error and variable selection properties of the estimators.
arXiv Detail & Related papers (2024-10-25T19:58:36Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Distribution learning via neural differential equations: a nonparametric
statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations.
We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$.
We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Single Trajectory Nonparametric Learning of Nonlinear Dynamics [8.438421942654292]
Given a single trajectory of a dynamical system, we analyze the performance of the nonparametric least squares estimator (LSE)
We leverage recently developed information-theoretic methods to establish the optimality of the LSE for non hypotheses classes.
We specialize our results to a number of scenarios of practical interest, such as Lipschitz dynamics, generalized linear models, and dynamics described by functions in certain classes of Reproducing Kernel Hilbert Spaces (RKHS)
arXiv Detail & Related papers (2022-02-16T19:38:54Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.