Benign Overfitting in Multiclass Classification: All Roads Lead to
Interpolation
- URL: http://arxiv.org/abs/2106.10865v3
- Date: Tue, 11 Jul 2023 19:04:17 GMT
- Title: Benign Overfitting in Multiclass Classification: All Roads Lead to
Interpolation
- Authors: Ke Wang, Vidya Muthukumar, Christos Thrampoulidis
- Abstract summary: We study benign overfitting in multiclass linear classification.
We consider the following training algorithms on separable data.
We derive novel bounds on the accuracy of the MNI classifier.
- Score: 39.02017410837255
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The literature on "benign overfitting" in overparameterized models has been
mostly restricted to regression or binary classification; however, modern
machine learning operates in the multiclass setting. Motivated by this
discrepancy, we study benign overfitting in multiclass linear classification.
Specifically, we consider the following training algorithms on separable data:
(i) empirical risk minimization (ERM) with cross-entropy loss, which converges
to the multiclass support vector machine (SVM) solution; (ii) ERM with
least-squares loss, which converges to the min-norm interpolating (MNI)
solution; and, (iii) the one-vs-all SVM classifier. First, we provide a simple
sufficient deterministic condition under which all three algorithms lead to
classifiers that interpolate the training data and have equal accuracy. When
the data is generated from Gaussian mixtures or a multinomial logistic model,
this condition holds under high enough effective overparameterization. We also
show that this sufficient condition is satisfied under "neural collapse", a
phenomenon that is observed in training deep neural networks. Second, we derive
novel bounds on the accuracy of the MNI classifier, thereby showing that all
three training algorithms lead to benign overfitting under sufficient
overparameterization. Ultimately, our analysis shows that good generalization
is possible for SVM solutions beyond the realm in which typical margin-based
bounds apply.
Related papers
- Rethinking generalization of classifiers in separable classes scenarios and over-parameterized regimes [0.0]
We show that in separable classes scenarios the proportion of "bad" global minima diminishes exponentially with the number of training data n.
We propose a model for the density distribution of the true error, yielding learning curves that align with experiments on MNIST and CIFAR-10.
arXiv Detail & Related papers (2024-10-22T10:12:57Z) - Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms [22.79595679373698]
Mixed linear regression is a well-studied problem in statistics and machine learning.
In this paper, we consider the more general problem of learning of mixed linear regression from samples.
We show that the AM and EM algorithms lead to learning in mixed linear regression by converging to the population loss minimizers.
arXiv Detail & Related papers (2024-06-03T09:43:24Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward.
For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Robustifying Binary Classification to Adversarial Perturbation [45.347651499585055]
In this paper we consider the problem of binary classification with adversarial perturbations.
We introduce a generalization to the max-margin classifier which takes into account the power of the adversary in manipulating the data.
Under some mild assumptions on the loss function, we theoretically show that the gradient descents converge to the RM classifier in its direction.
arXiv Detail & Related papers (2020-10-29T07:20:37Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning [63.64636047748605]
We develop a new theoretical framework to provide convergence guarantee for the general multi-step MAML algorithm.
In particular, our results suggest that an inner-stage step needs to be chosen inversely proportional to $N$ of inner-stage steps in order for $N$ MAML to have guaranteed convergence.
arXiv Detail & Related papers (2020-02-18T19:17:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.