Related papers: Gamma distribution-based sampling for imbalanced data

Gamma distribution-based sampling for imbalanced data

URL: http://arxiv.org/abs/2009.10343v1
Date: Tue, 22 Sep 2020 06:39:13 GMT
Title: Gamma distribution-based sampling for imbalanced data
Authors: Firuz Kamalov and Dmitry Denisov
Abstract summary: Imbalanced class distribution is a common problem in a number of fields including medical diagnostics, fraud detection, and others. We propose a novel method for balancing the class distribution in data through intelligent resampling of the minority class instances.
Score: 6.85316573653194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Imbalanced class distribution is a common problem in a number of fields including medical diagnostics, fraud detection, and others. It causes bias in classification algorithms leading to poor performance on the minority class data. In this paper, we propose a novel method for balancing the class distribution in data through intelligent resampling of the minority class instances. The proposed method is based on generating new minority instances in the neighborhood of the existing minority points via a gamma distribution. Our method offers a natural and coherent approach to balancing the data. We conduct a comprehensive numerical analysis of the new sampling technique. The experimental results show that the proposed method outperforms the existing state-of-the-art methods for imbalanced data. Concretely, the new sampling technique produces the best results on 12 out of 24 real life as well as synthetic datasets. For comparison, the SMOTE method achieves the top score on only 1 dataset. We conclude that the new technique offers a simple yet effective sampling approach to balance data.

Related papers

Neighbor displacement-based enhanced synthetic oversampling for multiclass imbalanced data [0.0]
Imbalanced multiclass datasets pose challenges for machine learning algorithms. Existing methods still suffer from sparse data and may not accurately represent the original data patterns. A hybrid method called Neighbor Displacement-based Enhanced Synthetic Oversampling (NDESO) is proposed in this paper.
arXiv Detail & Related papers (2025-01-07T19:15:00Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Generative Oversampling for Imbalanced Data via Majority-Guided VAE [15.93867386081279]
We propose a novel over-sampling model, called Majority-Guided VAE(MGVAE), which generates new minority samples under the guidance of a majority-based prior. In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks.
arXiv Detail & Related papers (2023-02-14T06:35:23Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
A Novel Hybrid Sampling Framework for Imbalanced Learning [0.0]
"SMOTE-RUS-NC" has been compared with other state-of-the-art sampling techniques. Rigorous experimentation has been conducted on 26 imbalanced datasets.
arXiv Detail & Related papers (2022-08-20T07:04:00Z)
Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models. One of the most widely-used approaches for tackling imbalanced data is re-weighting. We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z)
Imbalanced Classification via Explicit Gradient Learning From Augmented Data [0.0]
We propose a novel deep meta-learning technique to augment a given imbalanced dataset with new minority instances. The advantage of the proposed method is demonstrated on synthetic and real-world datasets with various imbalance ratios.
arXiv Detail & Related papers (2022-02-21T22:16:50Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Conditional Wasserstein GAN-based Oversampling of Tabular Data for Imbalanced Learning [10.051309746913512]
We propose an oversampling method based on a conditional Wasserstein GAN. We benchmark our method against standard oversampling methods and the imbalanced baseline on seven real-world datasets.
arXiv Detail & Related papers (2020-08-20T20:33:56Z)
M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes. Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)
UGRWO-Sampling for COVID-19 dataset: A modified random walk under-sampling approach based on graphs to imbalanced data classification [2.15242029196761]
This paper proposes a new RWO-Sampling (Random Walk Over-Sampling) based on graphs for imbalanced datasets. Two schemes based on under-sampling and over-sampling methods are introduced to keep the proximity information robust to noises and outliers.
arXiv Detail & Related papers (2020-02-10T03:29:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.