GMOTE: Gaussian based minority oversampling technique for imbalanced
classification adapting tail probability of outliers
- URL: http://arxiv.org/abs/2105.03855v1
- Date: Sun, 9 May 2021 07:04:37 GMT
- Title: GMOTE: Gaussian based minority oversampling technique for imbalanced
classification adapting tail probability of outliers
- Authors: Seung Jee Yang, Kyung Joon Cha
- Abstract summary: Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE)
In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets.
When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classification of imbalanced data is one of the common problems in the recent
field of data mining. Imbalanced data substantially affects the performance of
standard classification models. Data-level approaches mainly use the
oversampling methods to solve the problem, such as synthetic minority
oversampling Technique (SMOTE). However, since the methods such as SMOTE
generate instances by linear interpolation, synthetic data space may look like
a polygonal. Also, the oversampling methods generate outliers of the minority
class. In this paper, we proposed Gaussian based minority oversampling
technique (GMOTE) with a statistical perspective for imbalanced datasets. To
avoid linear interpolation and to consider outliers, this proposed method
generates instances by the Gaussian Mixture Model. Motivated by
clustering-based multivariate Gaussian outlier score (CMGOS), we propose to
adapt tail probability of instances through the Mahalanobis distance to
consider local outliers. The experiment was carried out on a representative set
of benchmark datasets. The performance of the GMOTE is compared with other
methods such as SMOTE. When the GMOTE is combined with classification and
regression tree (CART) or support vector machine (SVM), it shows better
accuracy and F1-Score. Experimental results demonstrate the robust performance.
Related papers
- Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - SUnAA: Sparse Unmixing using Archetypal Analysis [62.997667081978825]
This paper introduces a new geological error map technique using archetypal sparse analysis (SUnAA)
First, we design a new model based on archetypal sparse analysis (SUnAA)
arXiv Detail & Related papers (2023-08-09T07:58:33Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Generative Oversampling for Imbalanced Data via Majority-Guided VAE [15.93867386081279]
We propose a novel over-sampling model, called Majority-Guided VAE(MGVAE), which generates new minority samples under the guidance of a majority-based prior.
In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks.
arXiv Detail & Related papers (2023-02-14T06:35:23Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Intra-class Adaptive Augmentation with Neighbor Correction for Deep
Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning.
We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining.
Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Does Adversarial Oversampling Help us? [10.210871872870737]
We propose a three-player adversarial game-based end-to-end method to handle class imbalance in datasets.
Rather than adversarial minority oversampling, we propose an adversarial oversampling (AO) and a data-space oversampling (DO) approach.
The effectiveness of our proposed method has been validated with high-dimensional, highly imbalanced and large-scale multi-class datasets.
arXiv Detail & Related papers (2021-08-20T05:43:17Z) - SMOTified-GAN for class imbalanced pattern classification problems [0.41998444721319217]
We propose a novel two-phase oversampling approach that has the synergy of SMOTE and GAN.
The experimental results prove the sample quality of minority class(es) has been improved in a variety of tested benchmark datasets.
arXiv Detail & Related papers (2021-08-06T06:14:05Z) - Conditional Wasserstein GAN-based Oversampling of Tabular Data for
Imbalanced Learning [10.051309746913512]
We propose an oversampling method based on a conditional Wasserstein GAN.
We benchmark our method against standard oversampling methods and the imbalanced baseline on seven real-world datasets.
arXiv Detail & Related papers (2020-08-20T20:33:56Z) - Handling missing data in model-based clustering [0.0]
We propose two methods to fit Gaussian mixtures in the presence of missing data.
Both methods use a variant of the Monte Carlo Expectation-Maximisation algorithm for data augmentation.
We show that the proposed methods outperform the multiple imputation approach, both in terms of clusters identification and density estimation.
arXiv Detail & Related papers (2020-06-04T15:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.