Minimum Enclosing Ball Synthetic Minority Oversampling Technique from a Geometric Perspective
- URL: http://arxiv.org/abs/2408.03526v1
- Date: Wed, 7 Aug 2024 03:37:25 GMT
- Title: Minimum Enclosing Ball Synthetic Minority Oversampling Technique from a Geometric Perspective
- Authors: Yi-Yang Shangguan, Shi-Shun Chen, Xiao-Yang Li,
- Abstract summary: Class imbalance refers to the significant difference in the number of samples from different classes within a dataset.
This issue is prevalent in real-world classification tasks, such as software defect prediction, medical diagnosis, and fraud detection.
The synthetic minority oversampling technique (SMOTE) is widely used to address class imbalance issue.
This paper proposes the Minimum Enclosing Ball SMOTE (MEB-SMOTE) method from a geometry perspective.
- Score: 1.7851435784917604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Class imbalance refers to the significant difference in the number of samples from different classes within a dataset, making it challenging to identify minority class samples correctly. This issue is prevalent in real-world classification tasks, such as software defect prediction, medical diagnosis, and fraud detection. The synthetic minority oversampling technique (SMOTE) is widely used to address class imbalance issue, which is based on interpolation between randomly selected minority class samples and their neighbors. However, traditional SMOTE and most of its variants only interpolate between existing samples, which may be affected by noise samples in some cases and synthesize samples that lack diversity. To overcome these shortcomings, this paper proposes the Minimum Enclosing Ball SMOTE (MEB-SMOTE) method from a geometry perspective. Specifically, MEB is innovatively introduced into the oversampling method to construct a representative point. Then, high-quality samples are synthesized by interpolation between this representative point and the existing samples. The rationale behind constructing a representative point is discussed, demonstrating that the center of MEB is more suitable as the representative point. To exhibit the superiority of MEB-SMOTE, experiments are conducted on 15 real-world imbalanced datasets. The results indicate that MEB-SMOTE can effectively improve the classification performance on imbalanced datasets.
Related papers
- Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Detecting Adversarial Data by Probing Multiple Perturbations Using
Expected Perturbation Score [62.54911162109439]
Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions.
We propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations.
We develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples.
arXiv Detail & Related papers (2023-05-25T13:14:58Z) - BSGAN: A Novel Oversampling Technique for Imbalanced Pattern
Recognitions [0.0]
Class imbalanced problems (CIP) are one of the potential challenges in developing unbiased Machine Learning (ML) models for predictions.
CIP occurs when data samples are not equally distributed between the two or multiple classes.
We propose a hybrid oversampling technique by combining the power of borderline SMOTE and Generative Adrial Network to generate more diverse data.
arXiv Detail & Related papers (2023-05-16T20:02:39Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - Intra-class Adaptive Augmentation with Neighbor Correction for Deep
Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning.
We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining.
Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z) - Imbalanced Class Data Performance Evaluation and Improvement using Novel
Generative Adversarial Network-based Approach: SSG and GBO [0.0]
This study proposes two novel techniques: GAN-based Oversampling (GBO) and Support Vector Machine-SMOTE-GAN (SSG)
The preliminary computational result shows that SSG and GBO performed better on the expanded imbalanced eight benchmark datasets than the original SMOTE.
arXiv Detail & Related papers (2022-10-23T22:17:54Z) - Imbalanced Classification via a Tabular Translation GAN [4.864819846886142]
We present a model based on Generative Adversarial Networks which uses additional regularization losses to map majority samples to corresponding synthetic minority samples.
We show that the proposed method improves average precision when compared to alternative re-weighting and oversampling techniques.
arXiv Detail & Related papers (2022-04-19T06:02:53Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - A Novel Adaptive Minority Oversampling Technique for Improved
Classification in Data Imbalanced Scenarios [23.257891827728827]
Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers.
We propose a novel three step technique to address imbalanced data.
arXiv Detail & Related papers (2021-03-24T09:58:02Z) - Minority Class Oversampling for Tabular Data with Deep Generative Models [4.976007156860967]
We study the ability of deep generative models to provide realistic samples that improve performance on imbalanced classification tasks via oversampling.
Our experiments show that the way the method of sampling does not affect quality, but runtime varies widely.
We also observe that the improvements in terms of performance metric, while shown to be significant, often are minor in absolute terms.
arXiv Detail & Related papers (2020-05-07T21:35:57Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.