Multiclass Classification via Class-Weighted Nearest Neighbors
- URL: http://arxiv.org/abs/2004.04715v2
- Date: Mon, 4 May 2020 00:40:57 GMT
- Title: Multiclass Classification via Class-Weighted Nearest Neighbors
- Authors: Justin Khim, Ziyu Xu and Shashank Singh
- Abstract summary: We study statistical properties of the k-nearest neighbors algorithm for multiclass classification.
We derive upper and minimax lower bounds on accuracy, class-weighted risk, and uniform error.
- Score: 10.509405690286176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study statistical properties of the k-nearest neighbors algorithm for
multiclass classification, with a focus on settings where the number of classes
may be large and/or classes may be highly imbalanced. In particular, we
consider a variant of the k-nearest neighbor classifier with non-uniform
class-weightings, for which we derive upper and minimax lower bounds on
accuracy, class-weighted risk, and uniform error. Additionally, we show that
uniform error bounds lead to bounds on the difference between empirical
confusion matrix quantities and their population counterparts across a set of
weights. As a result, we may adjust the class weights to optimize
classification metrics such as F1 score or Matthew's Correlation Coefficient
that are commonly used in practice, particularly in settings with imbalanced
classes. We additionally provide a simple example to instantiate our bounds and
numerical experiments.
Related papers
- Class Uncertainty: A Measure to Mitigate Class Imbalance [0.0]
We show that considering solely the cardinality of classes does not cover all issues causing class imbalance.
We propose "Class Uncertainty" as the average predictive uncertainty of the training examples.
We also curate SVCI-20 as a novel dataset in which the classes have equal number of training examples but they differ in terms of their hardness.
arXiv Detail & Related papers (2023-11-23T16:36:03Z) - Mitigating Word Bias in Zero-shot Prompt-based Classifiers [55.60306377044225]
We show that matching class priors correlates strongly with the oracle upper bound performance.
We also demonstrate large consistent performance gains for prompt settings over a range of NLP tasks.
arXiv Detail & Related papers (2023-09-10T10:57:41Z) - Generalization for multiclass classification with overparameterized
linear models [3.3434274586532515]
We show that multiclass classification behaves like binary classification in that, as long as there are not too many classes, it is possible to generalize well.
Besides various technical challenges, it turns out that the key difference from the binary classification setting is that there are relatively fewer positive training examples of each class in the multiclass setting as the number of classes increases.
arXiv Detail & Related papers (2022-06-03T05:52:43Z) - Divide-and-Conquer Hard-thresholding Rules in High-dimensional
Imbalanced Classification [1.0312968200748118]
We study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions.
We show that due to data scarcity in one class, referred to as the minority class, the LDA ignores the minority class yielding a maximum misclassification rate.
We propose a new construction of a hard-conquering rule based on a divide-and-conquer technique that reduces the large difference between the misclassification rates.
arXiv Detail & Related papers (2021-11-05T07:44:28Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - Statistical Theory for Imbalanced Binary Classification [8.93993657323783]
We show that optimal classification performance depends on certain properties of class imbalance that have not previously been formalized.
Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance.
These results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.
arXiv Detail & Related papers (2021-07-05T03:55:43Z) - PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes.
We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training.
Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Cautious Active Clustering [79.23797234241471]
We consider the problem of classification of points sampled from an unknown probability measure on a Euclidean space.
Our approach is to consider the unknown probability measure as a convex combination of the conditional probabilities for each class.
arXiv Detail & Related papers (2020-08-03T23:47:31Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.