Related papers: GMOTE: Gaussian based minority oversampling technique for imbalanced classification adapting tail probability of outliers

GMOTE: Gaussian based minority oversampling technique for imbalanced classification adapting tail probability of outliers

URL: http://arxiv.org/abs/2105.03855v1
Date: Sun, 9 May 2021 07:04:37 GMT
Title: GMOTE: Gaussian based minority oversampling technique for imbalanced classification adapting tail probability of outliers
Authors: Seung Jee Yang, Kyung Joon Cha
Abstract summary: Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE) In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets. When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Classification of imbalanced data is one of the common problems in the recent field of data mining. Imbalanced data substantially affects the performance of standard classification models. Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE). However, since the methods such as SMOTE generate instances by linear interpolation, synthetic data space may look like a polygonal. Also, the oversampling methods generate outliers of the minority class. In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets. To avoid linear interpolation and to consider outliers, this proposed method generates instances by the Gaussian Mixture Model. Motivated by clustering-based multivariate Gaussian outlier score (CMGOS), we propose to adapt tail probability of instances through the Mahalanobis distance to consider local outliers. The experiment was carried out on a representative set of benchmark datasets. The performance of the GMOTE is compared with other methods such as SMOTE. When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score. Experimental results demonstrate the robust performance.

Related papers

Adaptive Cluster-Based Synthetic Minority Oversampling Technique for Traffic Mode Choice Prediction with Imbalanced Dataset [0.0]
Density-based spatial clustering is applied on minority classes to identify subgroups. The classes in each of these subgroups are then oversampled according to the ratio of data points of their local cluster to the largest majority class. When used in conjunction with machine learning models such as random forest and extreme gradient boosting, this oversampling method results in significantly higher F1 scores for the minority classes.
arXiv Detail & Related papers (2025-04-13T08:58:31Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
SUnAA: Sparse Unmixing using Archetypal Analysis [62.997667081978825]
This paper introduces a new geological error map technique using archetypal sparse analysis (SUnAA) First, we design a new model based on archetypal sparse analysis (SUnAA)
arXiv Detail & Related papers (2023-08-09T07:58:33Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Generative Oversampling for Imbalanced Data via Majority-Guided VAE [15.93867386081279]
We propose a novel over-sampling model, called Majority-Guided VAE(MGVAE), which generates new minority samples under the guidance of a majority-based prior. In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks.
arXiv Detail & Related papers (2023-02-14T06:35:23Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Intra-class Adaptive Augmentation with Neighbor Correction for Deep Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning. We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining. Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Does Adversarial Oversampling Help us? [10.210871872870737]
We propose a three-player adversarial game-based end-to-end method to handle class imbalance in datasets. Rather than adversarial minority oversampling, we propose an adversarial oversampling (AO) and a data-space oversampling (DO) approach. The effectiveness of our proposed method has been validated with high-dimensional, highly imbalanced and large-scale multi-class datasets.
arXiv Detail & Related papers (2021-08-20T05:43:17Z)
SMOTified-GAN for class imbalanced pattern classification problems [0.41998444721319217]
We propose a novel two-phase oversampling approach that has the synergy of SMOTE and GAN. The experimental results prove the sample quality of minority class(es) has been improved in a variety of tested benchmark datasets.
arXiv Detail & Related papers (2021-08-06T06:14:05Z)
Conditional Wasserstein GAN-based Oversampling of Tabular Data for Imbalanced Learning [10.051309746913512]
We propose an oversampling method based on a conditional Wasserstein GAN. We benchmark our method against standard oversampling methods and the imbalanced baseline on seven real-world datasets.
arXiv Detail & Related papers (2020-08-20T20:33:56Z)
Handling missing data in model-based clustering [0.0]
We propose two methods to fit Gaussian mixtures in the presence of missing data. Both methods use a variant of the Monte Carlo Expectation-Maximisation algorithm for data augmentation. We show that the proposed methods outperform the multiple imputation approach, both in terms of clusters identification and density estimation.
arXiv Detail & Related papers (2020-06-04T15:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.