Revisiting Discriminative vs. Generative Classifiers: Theory and
Implications
- URL: http://arxiv.org/abs/2302.02334v2
- Date: Mon, 29 May 2023 07:50:13 GMT
- Title: Revisiting Discriminative vs. Generative Classifiers: Theory and
Implications
- Authors: Chenyu Zheng, Guoqiang Wu, Fan Bao, Yue Cao, Chongxuan Li, Jun Zhu
- Abstract summary: This paper is inspired by the statistical efficiency of naive Bayes.
We present a multiclass $mathcalH$-consistency bound framework and an explicit bound for logistic loss.
Experiments on various pre-trained deep vision models show that naive Bayes consistently converges faster as the number of data increases.
- Score: 37.98169487351508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A large-scale deep model pre-trained on massive labeled or unlabeled data
transfers well to downstream tasks. Linear evaluation freezes parameters in the
pre-trained model and trains a linear classifier separately, which is efficient
and attractive for transfer. However, little work has investigated the
classifier in linear evaluation except for the default logistic regression.
Inspired by the statistical efficiency of naive Bayes, the paper revisits the
classical topic on discriminative vs. generative classifiers. Theoretically,
the paper considers the surrogate loss instead of the zero-one loss in analyses
and generalizes the classical results from binary cases to multiclass ones. We
show that, under mild assumptions, multiclass naive Bayes requires $O(\log n)$
samples to approach its asymptotic error while the corresponding multiclass
logistic regression requires $O(n)$ samples, where $n$ is the feature
dimension. To establish it, we present a multiclass $\mathcal{H}$-consistency
bound framework and an explicit bound for logistic loss, which are of
independent interests. Simulation results on a mixture of Gaussian validate our
theoretical findings. Experiments on various pre-trained deep vision models
show that naive Bayes consistently converges faster as the number of data
increases. Besides, naive Bayes shows promise in few-shot cases and we observe
the "two regimes" phenomenon in pre-trained supervised models. Our code is
available at https://github.com/ML-GSAI/Revisiting-Dis-vs-Gen-Classifiers.
Related papers
- Universality in Transfer Learning for Linear Models [18.427215139020625]
We study the problem of transfer learning in linear models for both regression and binary classification.
We provide an exact and rigorous analysis and relate generalization errors (in regression) and classification errors (in binary classification) for the pretrained and fine-tuned models.
arXiv Detail & Related papers (2024-10-03T03:09:09Z) - Regularized Linear Regression for Binary Classification [20.710343135282116]
Regularized linear regression is a promising approach for binary classification problems in which the training set has noisy labels.
We show that for large enough regularization strength, the optimal weights concentrate around two values of opposite sign.
We observe that in many cases the corresponding "compression" of each weight to a single bit leads to very little loss in performance.
arXiv Detail & Related papers (2023-11-03T23:18:21Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten.
We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly.
SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z) - CARD: Classification and Regression Diffusion Models [51.0421331214229]
We introduce classification and regression diffusion (CARD) models, which combine a conditional generative model and a pre-trained conditional mean estimator.
We demonstrate the outstanding ability of CARD in conditional distribution prediction with both toy examples and real-world datasets.
arXiv Detail & Related papers (2022-06-15T03:30:38Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much.
To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Large scale analysis of generalization error in learning using margin
based classification methods [2.436681150766912]
We derive the expression for the generalization error of a family of large-margin classifiers in the limit of both sample size $n$ and dimension $p$.
For two layer neural networks, we reproduce the recently developed double descent' phenomenology for several classification models.
arXiv Detail & Related papers (2020-07-16T20:31:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.