Statistical Theory for Imbalanced Binary Classification
- URL: http://arxiv.org/abs/2107.01777v1
- Date: Mon, 5 Jul 2021 03:55:43 GMT
- Title: Statistical Theory for Imbalanced Binary Classification
- Authors: Shashank Singh, Justin Khim
- Abstract summary: We show that optimal classification performance depends on certain properties of class imbalance that have not previously been formalized.
Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance.
These results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.
- Score: 8.93993657323783
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Within the vast body of statistical theory developed for binary
classification, few meaningful results exist for imbalanced classification, in
which data are dominated by samples from one of the two classes. Existing
theory faces at least two main challenges. First, meaningful results must
consider more complex performance measures than classification accuracy. To
address this, we characterize a novel generalization of the Bayes-optimal
classifier to any performance metric computed from the confusion matrix, and we
use this to show how relative performance guarantees can be obtained in terms
of the error of estimating the class probability function under uniform
($\mathcal{L}_\infty$) loss. Second, as we show, optimal classification
performance depends on certain properties of class imbalance that have not
previously been formalized. Specifically, we propose a novel sub-type of class
imbalance, which we call Uniform Class Imbalance. We analyze how Uniform Class
Imbalance influences optimal classifier performance and show that it
necessitates different classifier behavior than other types of class imbalance.
We further illustrate these two contributions in the case of $k$-nearest
neighbor classification, for which we develop novel guarantees. Together, these
results provide some of the first meaningful finite-sample statistical theory
for imbalanced binary classification.
Related papers
- Improving the classification of extreme classes by means of loss regularisation and generalised beta distributions [8.640930010669042]
We propose a unimodal regularisation approach to improve the classification performance of the first and last classes.
Performance in the extreme classes is compared using a new metric that takes into account their sensitivities.
The results for the proposed metric show that the generalised beta distribution generally improves classification performance in the extreme classes.
arXiv Detail & Related papers (2024-07-17T08:57:42Z) - Balanced Classification: A Unified Framework for Long-Tailed Object
Detection [74.94216414011326]
Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories.
We introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of inequalities caused by disparities in category distribution.
BACL consistently achieves performance improvements across various datasets with different backbones and architectures.
arXiv Detail & Related papers (2023-08-04T09:11:07Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Multi-class Classification with Fuzzy-feature Observations: Theory and
Algorithms [36.810603503167755]
We propose a novel framework to address a new realistic problem called multi-class classification with imprecise observations (MCIMO)
First, we give the theoretical analysis of the MCIMO problem based on fuzzy Rademacher complexity.
Then, two practical algorithms based on support vector machine and neural networks are constructed to solve the proposed new problem.
arXiv Detail & Related papers (2022-06-09T07:14:00Z) - Divide-and-Conquer Hard-thresholding Rules in High-dimensional
Imbalanced Classification [1.0312968200748118]
We study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions.
We show that due to data scarcity in one class, referred to as the minority class, the LDA ignores the minority class yielding a maximum misclassification rate.
We propose a new construction of a hard-conquering rule based on a divide-and-conquer technique that reduces the large difference between the misclassification rates.
arXiv Detail & Related papers (2021-11-05T07:44:28Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - Binary Classification: Counterbalancing Class Imbalance by Applying
Regression Models in Combination with One-Sided Label Shifts [0.4970364068620607]
We introduce a novel method, which addresses the issues of class imbalance.
We generate a set of negative and positive target labels, such that the corresponding regression task becomes balanced.
We evaluate our approach on a number of publicly available data sets and compare our proposed method to one of the most popular oversampling techniques.
arXiv Detail & Related papers (2020-11-30T13:24:47Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z) - Appropriateness of Performance Indices for Imbalanced Data
Classification: An Analysis [15.85259386116784]
We identify two fundamental conditions that a performance index must satisfy to be respectively resilient to altering number of testing instances from each class and the number of classes in the test set.
We investigate the capability of the indices to retain information about the classification performance over all the classes, even when the classifier exhibits extreme performance on some classes.
arXiv Detail & Related papers (2020-08-26T18:23:36Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.