Large scale analysis of generalization error in learning using margin
based classification methods
- URL: http://arxiv.org/abs/2007.10112v1
- Date: Thu, 16 Jul 2020 20:31:26 GMT
- Title: Large scale analysis of generalization error in learning using margin
based classification methods
- Authors: Hanwen Huang and Qinglong Yang
- Abstract summary: We derive the expression for the generalization error of a family of large-margin classifiers in the limit of both sample size $n$ and dimension $p$.
For two layer neural networks, we reproduce the recently developed double descent' phenomenology for several classification models.
- Score: 2.436681150766912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-margin classifiers are popular methods for classification. We derive
the asymptotic expression for the generalization error of a family of
large-margin classifiers in the limit of both sample size $n$ and dimension $p$
going to $\infty$ with fixed ratio $\alpha=n/p$. This family covers a broad
range of commonly used classifiers including support vector machine, distance
weighted discrimination, and penalized logistic regression. Our result can be
used to establish the phase transition boundary for the separability of two
classes. We assume that the data are generated from a single multivariate
Gaussian distribution with arbitrary covariance structure. We explore two
special choices for the covariance matrix: spiked population model and two
layer neural networks with random first layer weights. The method we used for
deriving the closed-form expression is from statistical physics known as the
replica method. Our asymptotic results match simulations already when $n,p$ are
of the order of a few hundreds. For two layer neural networks, we reproduce the
recently developed `double descent' phenomenology for several classification
models. We also discuss some statistical insights that can be drawn from these
analysis.
Related papers
- How many classifiers do we need? [50.69951049206484]
We provide a detailed analysis of how the disagreement and the polarization among classifiers relate to the performance gain achieved by aggregating individual classifiers.
We prove results for the behavior of the disagreement in terms of the number of classifiers.
Our theories and claims are supported by empirical results on several image classification tasks with various types of neural networks.
arXiv Detail & Related papers (2024-11-01T02:59:56Z) - Statistical Inference in Classification of High-dimensional Gaussian Mixture [1.2354076490479515]
We investigate the behavior of a general class of regularized convex classifiers in the high-dimensional limit.
Our focus is on the generalization error and variable selection properties of the estimators.
arXiv Detail & Related papers (2024-10-25T19:58:36Z) - Universality of max-margin classifiers [10.797131009370219]
We study the role of featurization maps and the high-dimensional universality of the misclassification error for non-Gaussian features.
In particular, the overparametrization threshold and generalization error can be computed within a simpler model.
arXiv Detail & Related papers (2023-09-29T22:45:56Z) - Revisiting Discriminative vs. Generative Classifiers: Theory and
Implications [37.98169487351508]
This paper is inspired by the statistical efficiency of naive Bayes.
We present a multiclass $mathcalH$-consistency bound framework and an explicit bound for logistic loss.
Experiments on various pre-trained deep vision models show that naive Bayes consistently converges faster as the number of data increases.
arXiv Detail & Related papers (2023-02-05T08:30:42Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Bayesian Semi-supervised Multi-category Classification under Nonparanormality [2.307581190124002]
Semi-supervised learning is a model training method that uses both labeled and unlabeled data.
This paper proposes a fully Bayes semi-supervised learning algorithm that can be applied to any multi-category classification problem.
arXiv Detail & Related papers (2020-01-11T21:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.