A Survey of Methods for Managing the Classification and Solution of Data
Imbalance Problem
- URL: http://arxiv.org/abs/2012.11870v1
- Date: Tue, 22 Dec 2020 08:03:22 GMT
- Title: A Survey of Methods for Managing the Classification and Solution of Data
Imbalance Problem
- Authors: Khan Md. Hasib, Md. Sadiq Iqbal, Faisal Muhammad Shah, Jubayer Al
Mahmud, Mahmudul Hasan Popel, Md. Imran Hossain Showrov, Shakil Ahmed,
Obaidur Rahman
- Abstract summary: This paper focuses on the architecture of single, hybrid, and ensemble method design to understand the current status of improving classification output in machine learning techniques to fix problems with class imbalances.
This survey paper also includes a statistical analysis of the classification algorithms under various methods and several other experimental conditions, as well as datasets used in different research papers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The problem of class imbalance is extensive for focusing on numerous
applications in the real world. In such a situation, nearly all of the examples
are labeled as one class called majority class, while far fewer examples are
labeled as the other class usually, the more important class is called
minority. Over the last few years, several types of research have been carried
out on the issue of class imbalance, including data sampling, cost-sensitive
analysis, Genetic Programming based models, bagging, boosting, etc.
Nevertheless, in this survey paper, we enlisted the 24 related studies in the
years 2003, 2008, 2010, 2012 and 2014 to 2019, focusing on the architecture of
single, hybrid, and ensemble method design to understand the current status of
improving classification output in machine learning techniques to fix problems
with class imbalances. This survey paper also includes a statistical analysis
of the classification algorithms under various methods and several other
experimental conditions, as well as datasets used in different research papers.
Related papers
- A Survey of Deep Long-Tail Classification Advancements [1.6233132273470656]
Many data distributions in the real world are hardly uniform. Instead, skewed and long-tailed distributions of various kinds are commonly observed.
This poses an interesting problem for machine learning, where most algorithms assume or work well with uniformly distributed data.
The problem is further exacerbated by current state-of-the-art deep learning models requiring large volumes of training data.
arXiv Detail & Related papers (2024-04-24T01:59:02Z) - A Survey of Methods for Handling Disk Data Imbalance [10.261915886145214]
This paper provides a comprehensive overview of research in the field of imbalanced data classification.
The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance.
arXiv Detail & Related papers (2023-10-13T05:35:13Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - A Survey of Methods for Addressing Class Imbalance in Deep-Learning
Based Natural Language Processing [68.37496795076203]
We provide guidance for NLP researchers and practitioners dealing with imbalanced data.
We first discuss various types of controlled and real-world class imbalance.
We organize the methods by whether they are based on sampling, data augmentation, choice of loss function, staged learning, or model design.
arXiv Detail & Related papers (2022-10-10T13:26:40Z) - Class-Imbalanced Complementary-Label Learning via Weighted Loss [8.934943507699131]
Complementary-label learning (CLL) is widely used in weakly supervised classification.
It faces a significant challenge in real-world datasets when confronted with class-imbalanced training samples.
We propose a novel problem setting that enables learning from class-imbalanced complementary labels for multi-class classification.
arXiv Detail & Related papers (2022-09-28T16:02:42Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z) - Imbalanced classification: a paradigm-based review [21.578692329486643]
Multiple resampling techniques have been proposed to address the class imbalance issues.
There is no general guidance on when to use each technique.
We provide a paradigm-based review of the common resampling techniques for binary classification under imbalanced class sizes.
arXiv Detail & Related papers (2020-02-11T18:34:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.