Appropriateness of Performance Indices for Imbalanced Data
Classification: An Analysis
- URL: http://arxiv.org/abs/2008.11752v1
- Date: Wed, 26 Aug 2020 18:23:36 GMT
- Title: Appropriateness of Performance Indices for Imbalanced Data
Classification: An Analysis
- Authors: Sankha Subhra Mullick and Shounak Datta and Sourish Gunesh Dhekane and
Swagatam Das
- Abstract summary: We identify two fundamental conditions that a performance index must satisfy to be respectively resilient to altering number of testing instances from each class and the number of classes in the test set.
We investigate the capability of the indices to retain information about the classification performance over all the classes, even when the classifier exhibits extreme performance on some classes.
- Score: 15.85259386116784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Indices quantifying the performance of classifiers under class-imbalance,
often suffer from distortions depending on the constitution of the test set or
the class-specific classification accuracy, creating difficulties in assessing
the merit of the classifier. We identify two fundamental conditions that a
performance index must satisfy to be respectively resilient to altering number
of testing instances from each class and the number of classes in the test set.
In light of these conditions, under the effect of class imbalance, we
theoretically analyze four indices commonly used for evaluating binary
classifiers and five popular indices for multi-class classifiers. For indices
violating any of the conditions, we also suggest remedial modification and
normalization. We further investigate the capability of the indices to retain
information about the classification performance over all the classes, even
when the classifier exhibits extreme performance on some classes. Simulation
studies are performed on high dimensional deep representations of subset of the
ImageNet dataset using four state-of-the-art classifiers tailored for handling
class imbalance. Finally, based on our theoretical findings and empirical
evidence, we recommend the appropriate indices that should be used to evaluate
the performance of classifiers in presence of class-imbalance.
Related papers
- Improving the classification of extreme classes by means of loss regularisation and generalised beta distributions [8.640930010669042]
We propose a unimodal regularisation approach to improve the classification performance of the first and last classes.
Performance in the extreme classes is compared using a new metric that takes into account their sensitivities.
The results for the proposed metric show that the generalised beta distribution generally improves classification performance in the extreme classes.
arXiv Detail & Related papers (2024-07-17T08:57:42Z) - Mitigating Word Bias in Zero-shot Prompt-based Classifiers [55.60306377044225]
We show that matching class priors correlates strongly with the oracle upper bound performance.
We also demonstrate large consistent performance gains for prompt settings over a range of NLP tasks.
arXiv Detail & Related papers (2023-09-10T10:57:41Z) - Characterizing the Optimal 0-1 Loss for Multi-class Classification with
a Test-time Attacker [57.49330031751386]
We find achievable information-theoretic lower bounds on loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset.
We provide a general framework for finding the optimal 0-1 loss that revolves around the construction of a conflict hypergraph from the data and adversarial constraints.
arXiv Detail & Related papers (2023-02-21T15:17:13Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - The Impact of Using Regression Models to Build Defect Classifiers [13.840006058766766]
It is common practice to discretize continuous defect counts into defective and non-defective classes.
We compare the performance and interpretation of defect classifiers built using both approaches.
arXiv Detail & Related papers (2022-02-12T22:12:55Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - Statistical Theory for Imbalanced Binary Classification [8.93993657323783]
We show that optimal classification performance depends on certain properties of class imbalance that have not previously been formalized.
Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance.
These results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.
arXiv Detail & Related papers (2021-07-05T03:55:43Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - A Skew-Sensitive Evaluation Framework for Imbalanced Data Classification [11.125446871030734]
Class distribution skews in imbalanced datasets may lead to models with prediction bias towards majority classes.
We propose a simple and general-purpose evaluation framework for imbalanced data classification that is sensitive to arbitrary skews in class cardinalities and importances.
arXiv Detail & Related papers (2020-10-12T19:47:09Z) - On Model Evaluation under Non-constant Class Imbalance [0.0]
Many real-world classification problems are significantly class-imbalanced to detriment of the class of interest.
The usual assumption is that the test dataset imbalance equals the real-world imbalance.
We introduce methods focusing on evaluation under non-constant class imbalance.
arXiv Detail & Related papers (2020-01-15T21:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.