Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View
- URL: http://arxiv.org/abs/2011.07729v1
- Date: Mon, 16 Nov 2020 05:17:29 GMT
- Title: Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View
- Authors: Christos Thrampoulidis, Samet Oymak, Mahdi Soltanolkotabi
- Abstract summary: We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
- Score: 82.80085730891126
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Contemporary machine learning applications often involve classification tasks
with many classes. Despite their extensive use, a precise understanding of the
statistical properties and behavior of classification algorithms is still
missing, especially in modern regimes where the number of classes is rather
large. In this paper, we take a step in this direction by providing the first
asymptotically precise analysis of linear multiclass classification. Our
theoretical analysis allows us to precisely characterize how the test error
varies over different training algorithms, data distributions, problem
dimensions as well as number of classes, inter/intra class correlations and
class priors. Specifically, our analysis reveals that the classification
accuracy is highly distribution-dependent with different algorithms achieving
optimal performance for different data distributions and/or training/features
sizes. Unlike linear regression/binary classification, the test error in
multiclass classification relies on intricate functions of the trained model
(e.g., correlation between some of the trained weights) whose asymptotic
behavior is difficult to characterize. This challenge is already present in
simple classifiers, such as those minimizing a square loss. Our novel
theoretical techniques allow us to overcome some of these challenges. The
insights gained may pave the way for a precise understanding of other
classification algorithms beyond those studied in this paper.
Related papers
- Generalization Bounds for Few-Shot Transfer Learning with Pretrained
Classifiers [26.844410679685424]
We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes.
We show that the few-shot error of the learned feature map on new classes is small in case of class-feature-variability collapse.
arXiv Detail & Related papers (2022-12-23T18:46:05Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data.
We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z) - Multi-class Classification with Fuzzy-feature Observations: Theory and
Algorithms [36.810603503167755]
We propose a novel framework to address a new realistic problem called multi-class classification with imprecise observations (MCIMO)
First, we give the theoretical analysis of the MCIMO problem based on fuzzy Rademacher complexity.
Then, two practical algorithms based on support vector machine and neural networks are constructed to solve the proposed new problem.
arXiv Detail & Related papers (2022-06-09T07:14:00Z) - Determination of class-specific variables in nonparametric
multiple-class classification [0.0]
We propose a probability-based nonparametric multiple-class classification method, and integrate it with the ability of identifying high impact variables for individual class.
We report the properties of the proposed method, and use both synthesized and real data sets to illustrate its properties under different classification situations.
arXiv Detail & Related papers (2022-05-07T10:08:58Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Statistical Theory for Imbalanced Binary Classification [8.93993657323783]
We show that optimal classification performance depends on certain properties of class imbalance that have not previously been formalized.
Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance.
These results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.
arXiv Detail & Related papers (2021-07-05T03:55:43Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Saliency-based Weighted Multi-label Linear Discriminant Analysis [101.12909759844946]
We propose a new variant of Linear Discriminant Analysis (LDA) to solve multi-label classification tasks.
The proposed method is based on a probabilistic model for defining the weights of individual samples.
The Saliency-based weighted Multi-label LDA approach is shown to lead to performance improvements in various multi-label classification problems.
arXiv Detail & Related papers (2020-04-08T19:40:53Z) - Probabilistic Diagnostic Tests for Degradation Problems in Supervised
Learning [0.0]
Problems such as class imbalance, overlapping, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms.
Probability diagnostic model based on identifying signs and symptoms of each problem is presented.
Behavior and performance of several supervised algorithms are studied when training sets have such problems.
arXiv Detail & Related papers (2020-04-06T20:32:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.