Related papers: A Synthetic Over-sampling method with Minority and Majority classes for imbalance problems

A Synthetic Over-sampling method with Minority and Majority classes for imbalance problems

URL: http://arxiv.org/abs/2011.04170v2
Date: Tue, 10 Aug 2021 05:40:44 GMT
Title: A Synthetic Over-sampling method with Minority and Majority classes for imbalance problems
Authors: Hadi A. Khorshidi and Uwe Aickelin
Abstract summary: We propose a new method to generate synthetic instances using Minority and Majority classes (SOMM) SOMM generates synthetic instances diversely within the minority data space. It updates the generated instances adaptively to the neighbourhood including both classes.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Class imbalance is a substantial challenge in classifying many real-world cases. Synthetic over-sampling methods have been effective to improve the performance of classifiers for imbalance problems. However, most synthetic over-sampling methods generate non-diverse synthetic instances within the convex hull formed by the existing minority instances as they only concentrate on the minority class and ignore the vast information provided by the majority class. They also often do not perform well for extremely imbalanced data as the fewer the minority instances, the less information to generate synthetic instances. Moreover, existing methods that generate synthetic instances using the majority class distributional information cannot perform effectively when the majority class has a multi-modal distribution. We propose a new method to generate diverse and adaptable synthetic instances using Synthetic Over-sampling with Minority and Majority classes (SOMM). SOMM generates synthetic instances diversely within the minority data space. It updates the generated instances adaptively to the neighbourhood including both classes. Thus, SOMM performs well for both binary and multiclass imbalance problems. We examine the performance of SOMM for binary and multiclass problems using benchmark data sets for different imbalance levels. The empirical results show the superiority of SOMM compared to other existing methods.

Related papers

Sampling Imbalanced Data with Multi-objective Bilevel Optimization [1.6385815610837167]
Two-class classification problems are often characterized by an imbalance between the number of majority and minority datapoints.<n>We introduce MOODS, a novel multi-objective bilevel optimization framework that guides both synthetic oversampling and majority undersampling.<n>We also introduce a validation metric that quantifies the goodness of a sampling method towards model performance.
arXiv Detail & Related papers (2025-06-12T21:31:08Z)
Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation [57.19995625893062]
Minority samples are underrepresented instances located in low-density regions of a data manifold.<n>We present a simple yet powerful guidance-free approach called Boost-and-Skip for generating minority samples.<n>We highlight that these seemingly-trivial modifications are supported by solid theoretical and empirical evidence.
arXiv Detail & Related papers (2025-02-10T14:37:26Z)
Minimum Enclosing Ball Synthetic Minority Oversampling Technique from a Geometric Perspective [1.7851435784917604]
Class imbalance refers to the significant difference in the number of samples from different classes within a dataset. This issue is prevalent in real-world classification tasks, such as software defect prediction, medical diagnosis, and fraud detection. The synthetic minority oversampling technique (SMOTE) is widely used to address class imbalance issue. This paper proposes the Minimum Enclosing Ball SMOTE (MEB-SMOTE) method from a geometry perspective.
arXiv Detail & Related papers (2024-08-07T03:37:25Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Generative Oversampling for Imbalanced Data via Majority-Guided VAE [15.93867386081279]
We propose a novel over-sampling model, called Majority-Guided VAE(MGVAE), which generates new minority samples under the guidance of a majority-based prior. In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks.
arXiv Detail & Related papers (2023-02-14T06:35:23Z)
Intra-class Adaptive Augmentation with Neighbor Correction for Deep Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning. We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining. Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z)
An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning [103.65758569417702]
Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance. We consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data. We study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels.
arXiv Detail & Related papers (2022-11-20T21:18:41Z)
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z)
Solving the Class Imbalance Problem Using a Counterfactual Method for Data Augmentation [4.454557728745761]
Learning from class imbalanced datasets poses challenges for machine learning algorithms. We advance a novel data augmentation method (adapted from eXplainable AI) that generates synthetic, counterfactual instances in the minority class. Several experiments using four different classifiers and 25 datasets are reported, which show that this Counterfactual Augmentation method (CFA) generates useful synthetic data points in the minority class.
arXiv Detail & Related papers (2021-11-05T14:14:06Z)
Synthesising Multi-Modal Minority Samples for Tabular Data [3.7311680121118345]
Adding synthetic minority samples to the dataset before training is a popular technique to address this difficulty. We propose a latent space framework which maps the multi-modal samples to a dense continuous latent space. We show that our framework generates better synthetic data than the existing methods.
arXiv Detail & Related papers (2021-05-17T23:54:08Z)
A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios [23.257891827728827]
Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers. We propose a novel three step technique to address imbalanced data.
arXiv Detail & Related papers (2021-03-24T09:58:02Z)
M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes. Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.