CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for
Imbalanced Data Classification
- URL: http://arxiv.org/abs/2004.03409v2
- Date: Sat, 17 Apr 2021 13:39:09 GMT
- Title: CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for
Imbalanced Data Classification
- Authors: Micha{\l} Koziarski
- Abstract summary: We propose a novel data-level algorithm for handling data imbalance in the classification task, Synthetic Majority Undersampling Technique (SMUTE)
We combine both in the Combined Synthetic Oversampling and Undersampling Technique (CSMOUTE), which integrates SMOTE oversampling with SMUTE undersampling.
The results of the conducted experimental study demonstrate the usefulness of both the SMUTE and the CSMOUTOUTE algorithms, especially when combined with more complex outliers.
- Score: 1.8275108630751844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we propose a novel data-level algorithm for handling data
imbalance in the classification task, Synthetic Majority Undersampling
Technique (SMUTE). SMUTE leverages the concept of interpolation of nearby
instances, previously introduced in the oversampling setting in SMOTE.
Furthermore, we combine both in the Combined Synthetic Oversampling and
Undersampling Technique (CSMOUTE), which integrates SMOTE oversampling with
SMUTE undersampling. The results of the conducted experimental study
demonstrate the usefulness of both the SMUTE and the CSMOUTE algorithms,
especially when combined with more complex classifiers, namely MLP and SVM, and
when applied on datasets consisting of a large number of outliers. This leads
us to a conclusion that the proposed approach shows promise for further
extensions accommodating local data characteristics, a direction discussed in
more detail in the paper.
Related papers
- Kernel-Based Enhanced Oversampling Method for Imbalanced Classification [10.112750055561877]
This paper introduces a novel oversampling technique designed to improve classification performance on imbalanced datasets.
The proposed method enhances the traditional SMOTE algorithm by incorporating convex combination and kernel-based weighting to generate synthetic samples that better represent the minority class.
arXiv Detail & Related papers (2025-04-12T09:24:23Z) - Enhancing Synthetic Oversampling for Imbalanced Datasets Using Proxima-Orion Neighbors and q-Gaussian Weighting Technique [0.16385815610837165]
We propose a novel oversampling algorithm to increase the number of instances of minority class in an imbalanced dataset.
We select two instances, Proxima and Orion, from the set of all minority class instances, based on a combination of relative distance weights and density estimation of majority class instances.
We conduct a comprehensive experiment on 42 datasets extracted from KEEL software and eight datasets from the UCI ML repository to evaluate the usefulness of the proposed (PO-QG) algorithm.
arXiv Detail & Related papers (2025-01-27T05:34:19Z) - AEMLO: AutoEncoder-Guided Multi-Label Oversampling [6.255095509216069]
AEMLO is an AutoEncoder-guided Oversampling technique for imbalanced multi-label data.
We show that AEMLO outperforms the existing state-of-the-art methods with extensive empirical studies.
arXiv Detail & Related papers (2024-08-23T14:01:33Z) - A Quantum Approach to Synthetic Minority Oversampling Technique (SMOTE) [1.5186937600119894]
The paper proposes the Quantum-SMOTE method to solve the prevalent problem of class imbalance in machine learning datasets.
Quantum-SMOTE generates synthetic data points using quantum processes such as swap tests and quantum rotation.
The approach is tested on a public dataset of Telecom Churn to determine its impact along with varying proportions of synthetic data.
arXiv Detail & Related papers (2024-02-27T10:46:36Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - INGB: Informed Nonlinear Granular Ball Oversampling Framework for Noisy
Imbalanced Classification [23.9207014576848]
In classification problems, the datasets are usually imbalanced, noisy or complex.
An informed nonlinear oversampling framework with the granular ball (INGB) as a new direction of oversampling is proposed in this paper.
arXiv Detail & Related papers (2023-07-03T01:55:20Z) - BSGAN: A Novel Oversampling Technique for Imbalanced Pattern
Recognitions [0.0]
Class imbalanced problems (CIP) are one of the potential challenges in developing unbiased Machine Learning (ML) models for predictions.
CIP occurs when data samples are not equally distributed between the two or multiple classes.
We propose a hybrid oversampling technique by combining the power of borderline SMOTE and Generative Adrial Network to generate more diverse data.
arXiv Detail & Related papers (2023-05-16T20:02:39Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - BIMS-PU: Bi-Directional and Multi-Scale Point Cloud Upsampling [60.257912103351394]
We develop a new point cloud upsampling pipeline called BIMS-PU.
We decompose the up/downsampling procedure into several up/downsampling sub-steps by breaking the target sampling factor into smaller factors.
We show that our method achieves superior results to state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-25T13:13:37Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - A Novel Adaptive Minority Oversampling Technique for Improved
Classification in Data Imbalanced Scenarios [23.257891827728827]
Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers.
We propose a novel three step technique to address imbalanced data.
arXiv Detail & Related papers (2021-03-24T09:58:02Z) - Learning Affinity-Aware Upsampling for Deep Image Matting [83.02806488958399]
We show that learning affinity in upsampling provides an effective and efficient approach to exploit pairwise interactions in deep networks.
In particular, results on the Composition-1k matting dataset show that A2U achieves a 14% relative improvement in the SAD metric against a strong baseline.
Compared with the state-of-the-art matting network, we achieve 8% higher performance with only 40% model complexity.
arXiv Detail & Related papers (2020-11-29T05:09:43Z) - Revisiting the Sample Complexity of Sparse Spectrum Approximation of
Gaussian Processes [60.479499225746295]
We introduce a new scalable approximation for Gaussian processes with provable guarantees which hold simultaneously over its entire parameter space.
Our approximation is obtained from an improved sample complexity analysis for sparse spectrum Gaussian processes (SSGPs)
arXiv Detail & Related papers (2020-11-17T05:41:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.