Imbalanced classification: a paradigm-based review
- URL: http://arxiv.org/abs/2002.04592v2
- Date: Thu, 1 Jul 2021 02:08:35 GMT
- Title: Imbalanced classification: a paradigm-based review
- Authors: Yang Feng, Min Zhou, Xin Tong
- Abstract summary: Multiple resampling techniques have been proposed to address the class imbalance issues.
There is no general guidance on when to use each technique.
We provide a paradigm-based review of the common resampling techniques for binary classification under imbalanced class sizes.
- Score: 21.578692329486643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common issue for classification in scientific research and industry is the
existence of imbalanced classes. When sample sizes of different classes are
imbalanced in training data, naively implementing a classification method often
leads to unsatisfactory prediction results on test data. Multiple resampling
techniques have been proposed to address the class imbalance issues. Yet, there
is no general guidance on when to use each technique. In this article, we
provide a paradigm-based review of the common resampling techniques for binary
classification under imbalanced class sizes. The paradigms we consider include
the classical paradigm that minimizes the overall classification error, the
cost-sensitive learning paradigm that minimizes a cost-adjusted weighted type I
and type II errors, and the Neyman-Pearson paradigm that minimizes the type II
error subject to a type I error constraint. Under each paradigm, we investigate
the combination of the resampling techniques and a few state-of-the-art
classification methods. For each pair of resampling techniques and
classification methods, we use simulation studies and a real data set on credit
card fraud to study the performance under different evaluation metrics. From
these extensive numerical experiments, we demonstrate under each classification
paradigm, the complex dynamics among resampling techniques, base classification
methods, evaluation metrics, and imbalance ratios. We also summarize a few
takeaway messages regarding the choices of resampling techniques and base
classification methods, which could be helpful for practitioners.
Related papers
- Preview-based Category Contrastive Learning for Knowledge Distillation [53.551002781828146]
We propose a novel preview-based category contrastive learning method for knowledge distillation (PCKD)
It first distills the structural knowledge of both instance-level feature correspondence and the relation between instance features and category centers.
It can explicitly optimize the category representation and explore the distinct correlation between representations of instances and categories.
arXiv Detail & Related papers (2024-10-18T03:31:00Z) - Observational and Experimental Insights into Machine Learning-Based Defect Classification in Wafers [0.8702432681310399]
This survey paper offers a comprehensive review of methodologies utilizing machine learning (ML) classification techniques for identifying wafer defects in semiconductor manufacturing.
An innovative taxonomy of methodologies that we present provides a detailed classification of algorithms into more refined categories and techniques.
arXiv Detail & Related papers (2023-10-16T14:46:45Z) - A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Learning Acceptance Regions for Many Classes with Anomaly Detection [19.269724165953274]
Many existing set-valued classification methods do not consider the possibility that a new class that never appeared in the training data appears in the test data.
We propose a Generalized Prediction Set (GPS) approach to estimate the acceptance regions while considering the possibility of a new class in the test data.
Unlike previous methods, the proposed method achieves a good balance between accuracy, efficiency, and anomaly detection rate.
arXiv Detail & Related papers (2022-09-20T19:40:33Z) - Multi-class Classification with Fuzzy-feature Observations: Theory and
Algorithms [36.810603503167755]
We propose a novel framework to address a new realistic problem called multi-class classification with imprecise observations (MCIMO)
First, we give the theoretical analysis of the MCIMO problem based on fuzzy Rademacher complexity.
Then, two practical algorithms based on support vector machine and neural networks are constructed to solve the proposed new problem.
arXiv Detail & Related papers (2022-06-09T07:14:00Z) - Binary Classification: Counterbalancing Class Imbalance by Applying
Regression Models in Combination with One-Sided Label Shifts [0.4970364068620607]
We introduce a novel method, which addresses the issues of class imbalance.
We generate a set of negative and positive target labels, such that the corresponding regression task becomes balanced.
We evaluate our approach on a number of publicly available data sets and compare our proposed method to one of the most popular oversampling techniques.
arXiv Detail & Related papers (2020-11-30T13:24:47Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.