Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization
- URL: http://arxiv.org/abs/2011.09148v4
- Date: Wed, 15 Sep 2021 07:27:42 GMT
- Title: Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization
- Authors: Ke Wang, Christos Thrampoulidis
- Abstract summary: We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
- Score: 39.35822033674126
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Deep neural networks generalize well despite being exceedingly
overparameterized and being trained without explicit regularization. This
curious phenomenon has inspired extensive research activity in establishing its
statistical principles: Under what conditions is it observed? How do these
depend on the data and on the training algorithm? When does regularization
benefit generalization? While such questions remain wide open for deep neural
nets, recent works have attempted gaining insights by studying simpler, often
linear, models. Our paper contributes to this growing line of work by examining
binary linear classification under a generative Gaussian mixture model.
Motivated by recent results on the implicit bias of gradient descent, we study
both max-margin SVM classifiers (corresponding to logistic loss) and min-norm
interpolating classifiers (corresponding to least-squares loss). First, we
leverage an idea introduced in [V. Muthukumar et al., arXiv:2005.08054, (2020)]
to relate the SVM solution to the min-norm interpolating solution. Second, we
derive novel non-asymptotic bounds on the classification error of the latter.
Combining the two, we present novel sufficient conditions on the covariance
spectrum and on the signal-to-noise ratio (SNR) under which interpolating
estimators achieve asymptotically optimal performance as overparameterization
increases. Interestingly, our results extend to a noisy model with constant
probability noise flips. Contrary to previously studied discriminative data
models, our results emphasize the crucial role of the SNR and its interplay
with the data covariance. Finally, via a combination of analytical arguments
and numerical demonstrations we identify conditions under which the
interpolating estimator performs better than corresponding regularized
estimates.
Related papers
- Generalization error of min-norm interpolators in transfer learning [2.7309692684728617]
Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms.
In many applications, a limited amount of test data may be available during training, yet properties of min-norm in this setting are not well-understood.
We establish a novel anisotropic local law to achieve these characterizations.
arXiv Detail & Related papers (2024-06-20T02:23:28Z) - Max-affine regression via first-order methods [7.12511675782289]
The max-affine model ubiquitously arises in applications in signal processing and statistics.
We present a non-asymptotic convergence analysis of gradient descent (GD) and mini-batch gradient descent (SGD) for max-affine regression.
arXiv Detail & Related papers (2023-08-15T23:46:44Z) - Precise Asymptotic Generalization for Multiclass Classification with
Overparameterized Linear Models [4.093769373833101]
We resolve the conjecture posed in Subramanian et al.'22, where the number of data points, features, and classes all grow together.
Our new lower bounds are akin to an information-theoretic strong converse: they establish that the misclassification rate goes to 0 or 1ally.
The key to our tight analysis is a new variant of the Hanson-Wright inequality which is broadly useful for multiclass problems with sparse labels.
arXiv Detail & Related papers (2023-06-23T00:59:15Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Benign Overfitting in Multiclass Classification: All Roads Lead to
Interpolation [39.02017410837255]
We study benign overfitting in multiclass linear classification.
We consider the following training algorithms on separable data.
We derive novel bounds on the accuracy of the MNI classifier.
arXiv Detail & Related papers (2021-06-21T05:34:36Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.