Related papers: Statistical Theory for Imbalanced Binary Classification

Statistical Theory for Imbalanced Binary Classification

URL: http://arxiv.org/abs/2107.01777v1
Date: Mon, 5 Jul 2021 03:55:43 GMT
Title: Statistical Theory for Imbalanced Binary Classification
Authors: Shashank Singh, Justin Khim
Abstract summary: We show that optimal classification performance depends on certain properties of class imbalance that have not previously been formalized. Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance. These results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.
Score: 8.93993657323783
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Within the vast body of statistical theory developed for binary classification, few meaningful results exist for imbalanced classification, in which data are dominated by samples from one of the two classes. Existing theory faces at least two main challenges. First, meaningful results must consider more complex performance measures than classification accuracy. To address this, we characterize a novel generalization of the Bayes-optimal classifier to any performance metric computed from the confusion matrix, and we use this to show how relative performance guarantees can be obtained in terms of the error of estimating the class probability function under uniform ($\mathcal{L}_\infty$) loss. Second, as we show, optimal classification performance depends on certain properties of class imbalance that have not previously been formalized. Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance. We analyze how Uniform Class Imbalance influences optimal classifier performance and show that it necessitates different classifier behavior than other types of class imbalance. We further illustrate these two contributions in the case of $k$-nearest neighbor classification, for which we develop novel guarantees. Together, these results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.

Related papers

Improving the classification of extreme classes by means of loss regularisation and generalised beta distributions [8.640930010669042]
We propose a unimodal regularisation approach to improve the classification performance of the first and last classes. Performance in the extreme classes is compared using a new metric that takes into account their sensitivities. The results for the proposed metric show that the generalised beta distribution generally improves classification performance in the extreme classes.
arXiv Detail & Related papers (2024-07-17T08:57:42Z)
Balanced Classification: A Unified Framework for Long-Tailed Object Detection [74.94216414011326]
Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories. We introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of inequalities caused by disparities in category distribution. BACL consistently achieves performance improvements across various datasets with different backbones and architectures.
arXiv Detail & Related papers (2023-08-04T09:11:07Z)
Parametric Classification for Generalized Category Discovery: A Baseline Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z)
Multi-class Classification with Fuzzy-feature Observations: Theory and Algorithms [36.810603503167755]
We propose a novel framework to address a new realistic problem called multi-class classification with imprecise observations (MCIMO) First, we give the theoretical analysis of the MCIMO problem based on fuzzy Rademacher complexity. Then, two practical algorithms based on support vector machine and neural networks are constructed to solve the proposed new problem.
arXiv Detail & Related papers (2022-06-09T07:14:00Z)
Divide-and-Conquer Hard-thresholding Rules in High-dimensional Imbalanced Classification [1.0312968200748118]
We study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions. We show that due to data scarcity in one class, referred to as the minority class, the LDA ignores the minority class yielding a maximum misclassification rate. We propose a new construction of a hard-conquering rule based on a divide-and-conquer technique that reduces the large difference between the misclassification rates.
arXiv Detail & Related papers (2021-11-05T07:44:28Z)
When in Doubt: Improving Classification Performance with Alternating Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification. CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution. We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z)
Binary Classification: Counterbalancing Class Imbalance by Applying Regression Models in Combination with One-Sided Label Shifts [0.4970364068620607]
We introduce a novel method, which addresses the issues of class imbalance. We generate a set of negative and positive target labels, such that the corresponding regression task becomes balanced. We evaluate our approach on a number of publicly available data sets and compare our proposed method to one of the most popular oversampling techniques.
arXiv Detail & Related papers (2020-11-30T13:24:47Z)
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification. Our analysis reveals that the classification accuracy is highly distribution-dependent. The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z)
Learning and Evaluating Representations for Deep One-class Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification. We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations. In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
Appropriateness of Performance Indices for Imbalanced Data Classification: An Analysis [15.85259386116784]
We identify two fundamental conditions that a performance index must satisfy to be respectively resilient to altering number of testing instances from each class and the number of classes in the test set. We investigate the capability of the indices to retain information about the classification performance over all the classes, even when the classifier exhibits extreme performance on some classes.
arXiv Detail & Related papers (2020-08-26T18:23:36Z)
M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes. Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.