Related papers: Rethinking generalization of classifiers in separable classes scenarios and over-parameterized regimes

Rethinking generalization of classifiers in separable classes scenarios and over-parameterized regimes

URL: http://arxiv.org/abs/2410.16868v1
Date: Tue, 22 Oct 2024 10:12:57 GMT
Title: Rethinking generalization of classifiers in separable classes scenarios and over-parameterized regimes
Authors: Julius Martinetz, Christoph Linse, Thomas Martinetz,
Abstract summary: We show that in separable classes scenarios the proportion of "bad" global minima diminishes exponentially with the number of training data n. We propose a model for the density distribution of the true error, yielding learning curves that align with experiments on MNIST and CIFAR-10.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the learning dynamics of classifiers in scenarios where classes are separable or classifiers are over-parameterized. In both cases, Empirical Risk Minimization (ERM) results in zero training error. However, there are many global minima with a training error of zero, some of which generalize well and some of which do not. We show that in separable classes scenarios the proportion of "bad" global minima diminishes exponentially with the number of training data n. Our analysis provides bounds and learning curves dependent solely on the density distribution of the true error for the given classifier function set, irrespective of the set's size or complexity (e.g., number of parameters). This observation may shed light on the unexpectedly good generalization of over-parameterized Neural Networks. For the over-parameterized scenario, we propose a model for the density distribution of the true error, yielding learning curves that align with experiments on MNIST and CIFAR-10.

Related papers

Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
A Statistical Model for Predicting Generalization in Few-Shot Classification [6.158812834002346]
We introduce a Gaussian model of the feature distribution to predict the generalization error. We show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy.
arXiv Detail & Related papers (2022-12-13T10:21:15Z)
Do highly over-parameterized neural networks generalize since bad solutions are rare? [0.0]
Empirical Risk Minimization (ERM) for learning leads to zero training error. We show that under certain conditions the fraction of "bad" global minima with a true error larger than epsilon decays to zero exponentially fast with the number of training data n.
arXiv Detail & Related papers (2022-11-07T14:02:07Z)
Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation [39.02017410837255]
We study benign overfitting in multiclass linear classification. We consider the following training algorithms on separable data. We derive novel bounds on the accuracy of the MNI classifier.
arXiv Detail & Related papers (2021-06-21T05:34:36Z)
Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions. Subfunctions have their own activation pattern, domain, and empirical error. Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z)
Label-Imbalanced and Group-Sensitive Classification under Overparameterization [32.923780772605596]
Label-imbalanced and group-sensitive classification seeks to appropriately modify standard training algorithms to optimize relevant metrics. We show that a logit-adjusted loss modification to standard empirical risk minimization might be ineffective in general. We show that our results extend naturally to binary classification with sensitive groups, thus treating the two common types of imbalances (label/group) in a unifying way.
arXiv Detail & Related papers (2021-03-02T08:09:43Z)
On the Minimal Error of Empirical Risk Minimization [90.09093901700754]
We study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression. Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data.
arXiv Detail & Related papers (2021-02-24T04:47:55Z)
Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
An Investigation of Why Overparameterization Exacerbates Spurious Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior. We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.