A Synthetic Over-sampling method with Minority and Majority classes for
imbalance problems
- URL: http://arxiv.org/abs/2011.04170v2
- Date: Tue, 10 Aug 2021 05:40:44 GMT
- Title: A Synthetic Over-sampling method with Minority and Majority classes for
imbalance problems
- Authors: Hadi A. Khorshidi and Uwe Aickelin
- Abstract summary: We propose a new method to generate synthetic instances using Minority and Majority classes (SOMM)
SOMM generates synthetic instances diversely within the minority data space.
It updates the generated instances adaptively to the neighbourhood including both classes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Class imbalance is a substantial challenge in classifying many real-world
cases. Synthetic over-sampling methods have been effective to improve the
performance of classifiers for imbalance problems. However, most synthetic
over-sampling methods generate non-diverse synthetic instances within the
convex hull formed by the existing minority instances as they only concentrate
on the minority class and ignore the vast information provided by the majority
class. They also often do not perform well for extremely imbalanced data as the
fewer the minority instances, the less information to generate synthetic
instances. Moreover, existing methods that generate synthetic instances using
the majority class distributional information cannot perform effectively when
the majority class has a multi-modal distribution. We propose a new method to
generate diverse and adaptable synthetic instances using Synthetic
Over-sampling with Minority and Majority classes (SOMM). SOMM generates
synthetic instances diversely within the minority data space. It updates the
generated instances adaptively to the neighbourhood including both classes.
Thus, SOMM performs well for both binary and multiclass imbalance problems. We
examine the performance of SOMM for binary and multiclass problems using
benchmark data sets for different imbalance levels. The empirical results show
the superiority of SOMM compared to other existing methods.
Related papers
- Minimum Enclosing Ball Synthetic Minority Oversampling Technique from a Geometric Perspective [1.7851435784917604]
Class imbalance refers to the significant difference in the number of samples from different classes within a dataset.
This issue is prevalent in real-world classification tasks, such as software defect prediction, medical diagnosis, and fraud detection.
The synthetic minority oversampling technique (SMOTE) is widely used to address class imbalance issue.
This paper proposes the Minimum Enclosing Ball SMOTE (MEB-SMOTE) method from a geometry perspective.
arXiv Detail & Related papers (2024-08-07T03:37:25Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Generative Oversampling for Imbalanced Data via Majority-Guided VAE [15.93867386081279]
We propose a novel over-sampling model, called Majority-Guided VAE(MGVAE), which generates new minority samples under the guidance of a majority-based prior.
In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks.
arXiv Detail & Related papers (2023-02-14T06:35:23Z) - Intra-class Adaptive Augmentation with Neighbor Correction for Deep
Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning.
We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining.
Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z) - An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised
Learning [103.65758569417702]
Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance.
We consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.
We study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels.
arXiv Detail & Related papers (2022-11-20T21:18:41Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Solving the Class Imbalance Problem Using a Counterfactual Method for
Data Augmentation [4.454557728745761]
Learning from class imbalanced datasets poses challenges for machine learning algorithms.
We advance a novel data augmentation method (adapted from eXplainable AI) that generates synthetic, counterfactual instances in the minority class.
Several experiments using four different classifiers and 25 datasets are reported, which show that this Counterfactual Augmentation method (CFA) generates useful synthetic data points in the minority class.
arXiv Detail & Related papers (2021-11-05T14:14:06Z) - Synthesising Multi-Modal Minority Samples for Tabular Data [3.7311680121118345]
Adding synthetic minority samples to the dataset before training is a popular technique to address this difficulty.
We propose a latent space framework which maps the multi-modal samples to a dense continuous latent space.
We show that our framework generates better synthetic data than the existing methods.
arXiv Detail & Related papers (2021-05-17T23:54:08Z) - A Novel Adaptive Minority Oversampling Technique for Improved
Classification in Data Imbalanced Scenarios [23.257891827728827]
Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers.
We propose a novel three step technique to address imbalanced data.
arXiv Detail & Related papers (2021-03-24T09:58:02Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.