Handling Imbalanced Data: A Case Study for Binary Class Problems
- URL: http://arxiv.org/abs/2010.04326v1
- Date: Fri, 9 Oct 2020 02:04:14 GMT
- Title: Handling Imbalanced Data: A Case Study for Binary Class Problems
- Authors: Richmond Addo Danquah
- Abstract summary: The major issues in terms of solving for classification problems are the issues of Imbalanced data.
This paper focuses on both synthetic oversampling techniques and manually computes synthetic data points to enhance easy comprehension of the algorithms.
We analyze the application of these synthetic oversampling techniques on binary classification problems with different Imbalanced ratios and sample sizes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For several years till date, the major issues in terms of solving for
classification problems are the issues of Imbalanced data. Because majority of
the machine learning algorithms by default assumes all data are balanced, the
algorithms do not take into consideration the distribution of the data sample
class. The results tend to be unsatisfactory and skewed towards the majority
sample class distribution. This implies that the consequences as a result of
using a model built using an Imbalanced data without handling for the Imbalance
in the data could be misleading both in practice and theory. Most researchers
have focused on the application of Synthetic Minority Oversampling Technique
(SMOTE) and Adaptive Synthetic (ADASYN) Sampling Approach in handling data
Imbalance independently in their works and have failed to better explain the
algorithms behind these techniques with computed examples. This paper focuses
on both synthetic oversampling techniques and manually computes synthetic data
points to enhance easy comprehension of the algorithms. We analyze the
application of these synthetic oversampling techniques on binary classification
problems with different Imbalanced ratios and sample sizes.
Related papers
- Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - Synthetic Information towards Maximum Posterior Ratio for deep learning
on Imbalanced Data [1.7495515703051119]
We propose a technique for data balancing by generating synthetic data for the minority class.
Our method prioritizes balancing the informative regions by identifying high entropy samples.
Our experimental results on forty-one datasets demonstrate the superior performance of our technique.
arXiv Detail & Related papers (2024-01-05T01:08:26Z) - Generalized Oversampling for Learning from Imbalanced datasets and
Associated Theory [0.0]
In supervised learning, it is quite frequent to be confronted with real imbalanced datasets.
We propose a data augmentation procedure, the GOLIATH algorithm, based on kernel density estimates.
We evaluate the performance of the GOLIATH algorithm in imbalanced regression situations.
arXiv Detail & Related papers (2023-08-05T23:08:08Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - A Novel Hybrid Sampling Framework for Imbalanced Learning [0.0]
"SMOTE-RUS-NC" has been compared with other state-of-the-art sampling techniques.
Rigorous experimentation has been conducted on 26 imbalanced datasets.
arXiv Detail & Related papers (2022-08-20T07:04:00Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Survey of Imbalanced Data Methodologies [1.370633147306388]
We applied the under-sampling/over-sampling methodologies to several modeling algorithms on UCI and Keel data sets.
The performance was analyzed for class-imbalance methods, modeling algorithms and grid search criteria comparison.
arXiv Detail & Related papers (2021-04-06T02:10:22Z) - Weakly Supervised-Based Oversampling for High Imbalance and High
Dimensionality Data Classification [2.9283685972609494]
Oversampling is an effective method to solve imbalanced classification.
Inaccurate labels of synthetic samples would distort the distribution of the dataset.
This paper introduces the idea of weakly supervised learning to handle the inaccurate labeling of synthetic samples.
arXiv Detail & Related papers (2020-09-29T15:26:34Z) - Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework.
We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z) - Machine Learning Pipeline for Pulsar Star Dataset [58.720142291102135]
This work brings together some of the most common machine learning (ML) algorithms.
The objective is to make a comparison at the level of obtained results from a set of unbalanced data.
arXiv Detail & Related papers (2020-05-03T23:35:44Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.