Generalization error in high-dimensional perceptrons: Approaching Bayes
error with convex optimization
- URL: http://arxiv.org/abs/2006.06560v2
- Date: Sat, 7 Nov 2020 10:41:55 GMT
- Title: Generalization error in high-dimensional perceptrons: Approaching Bayes
error with convex optimization
- Authors: Benjamin Aubin, Florent Krzakala, Yue M. Lu, Lenka Zdeborov\'a
- Abstract summary: We study the generalization performances of standard classifiers in the high-dimensional regime.
We design an optimal loss and regularizer that provably leads to Bayes-optimal generalization error.
- Score: 37.57922952189396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a commonly studied supervised classification of a synthetic
dataset whose labels are generated by feeding a one-layer neural network with
random iid inputs. We study the generalization performances of standard
classifiers in the high-dimensional regime where $\alpha=n/d$ is kept finite in
the limit of a high dimension $d$ and number of samples $n$. Our contribution
is three-fold: First, we prove a formula for the generalization error achieved
by $\ell_2$ regularized classifiers that minimize a convex loss. This formula
was first obtained by the heuristic replica method of statistical physics.
Secondly, focussing on commonly used loss functions and optimizing the $\ell_2$
regularization strength, we observe that while ridge regression performance is
poor, logistic and hinge regression are surprisingly able to approach the
Bayes-optimal generalization error extremely closely. As $\alpha \to \infty$
they lead to Bayes-optimal rates, a fact that does not follow from predictions
of margin-based generalization error bounds. Third, we design an optimal loss
and regularizer that provably leads to Bayes-optimal generalization error.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression [29.57766164934947]
We investigate popular resampling methods for estimating the uncertainty of statistical models.
We provide a tight description of the biases and variances estimated by these methods in the context of generalized linear models.
arXiv Detail & Related papers (2024-02-21T08:50:33Z) - Retire: Robust Expectile Regression in High Dimensions [3.9391041278203978]
Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data.
We propose and study (penalized) robust expectile regression (retire)
We show that the proposed procedure can be efficiently solved by a semismooth Newton coordinate descent algorithm.
arXiv Detail & Related papers (2022-12-11T18:03:12Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Consistent Estimation for PCA and Sparse Regression with Oblivious
Outliers [13.244654316770815]
We develop machinery to design efficiently computable and consistent estimators.
For sparse regression, we achieve consistency for optimal sample size $ngsim (klog d)/alpha2$.
In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix.
arXiv Detail & Related papers (2021-11-04T15:59:44Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - When Does Preconditioning Help or Hurt Generalization? [74.25170084614098]
We show how the textitimplicit bias of first and second order methods affects the comparison of generalization properties.
We discuss several approaches to manage the bias-variance tradeoff, and the potential benefit of interpolating between GD and NGD.
arXiv Detail & Related papers (2020-06-18T17:57:26Z) - Flatness is a False Friend [0.7614628596146599]
Hessian based measures of flatness have been argued, used and shown to relate to generalisation.
We show that for feed forward neural networks under the cross entropy loss, we would expect low loss solutions with large weights to have small Hessian based measures of flatness.
arXiv Detail & Related papers (2020-06-16T11:55:24Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.