Generative Oversampling for Imbalanced Data via Majority-Guided VAE
- URL: http://arxiv.org/abs/2302.10910v1
- Date: Tue, 14 Feb 2023 06:35:23 GMT
- Title: Generative Oversampling for Imbalanced Data via Majority-Guided VAE
- Authors: Qingzhong Ai, Pengyun Wang, Lirong He, Liangjian Wen, Lujia Pan,
Zenglin Xu
- Abstract summary: We propose a novel over-sampling model, called Majority-Guided VAE(MGVAE), which generates new minority samples under the guidance of a majority-based prior.
In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks.
- Score: 15.93867386081279
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Learning with imbalanced data is a challenging problem in deep learning.
Over-sampling is a widely used technique to re-balance the sampling
distribution of training data. However, most existing over-sampling methods
only use intra-class information of minority classes to augment the data but
ignore the inter-class relationships with the majority ones, which is prone to
overfitting, especially when the imbalance ratio is large. To address this
issue, we propose a novel over-sampling model, called Majority-Guided
VAE~(MGVAE), which generates new minority samples under the guidance of a
majority-based prior. In this way, the newly generated minority samples can
inherit the diversity and richness of the majority ones, thus mitigating
overfitting in downstream tasks. Furthermore, to prevent model collapse under
limited data, we first pre-train MGVAE on sufficient majority samples and then
fine-tune based on minority samples with Elastic Weight Consolidation(EWC)
regularization. Experimental results on benchmark image datasets and real-world
tabular data show that MGVAE achieves competitive improvements over other
over-sampling methods in downstream classification tasks, demonstrating the
effectiveness of our method.
Related papers
- Self-Guided Generation of Minority Samples Using Diffusion Models [57.319845580050924]
We present a novel approach for generating minority samples that live on low-density regions of a data manifold.
Our framework is built upon diffusion models, leveraging the principle of guided sampling.
Experiments on benchmark real datasets demonstrate that our approach can greatly improve the capability of creating realistic low-likelihood minority instances.
arXiv Detail & Related papers (2024-07-16T10:03:29Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Don't Play Favorites: Minority Guidance for Diffusion Models [59.75996752040651]
We present a novel framework that can make the generation process of the diffusion models focus on the minority samples.
We develop minority guidance, a sampling technique that can guide the generation process toward regions with desired likelihood levels.
arXiv Detail & Related papers (2023-01-29T03:08:47Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking.
Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z) - Stop Oversampling for Class Imbalance Learning: A Critical Review [0.9208007322096533]
Oversampling has been employed to overcome the challenge of learning from imbalanced datasets.
The fundamental difficulty with oversampling approaches is that, given a real-life population, the synthesized samples may not truly belong to the minority class.
We devised a new oversampling evaluation system based on hiding a number of majority examples and comparing them to those generated by the oversampling process.
arXiv Detail & Related papers (2022-02-04T15:11:11Z) - GMOTE: Gaussian based minority oversampling technique for imbalanced
classification adapting tail probability of outliers [0.0]
Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE)
In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets.
When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score.
arXiv Detail & Related papers (2021-05-09T07:04:37Z) - A Novel Adaptive Minority Oversampling Technique for Improved
Classification in Data Imbalanced Scenarios [23.257891827728827]
Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers.
We propose a novel three step technique to address imbalanced data.
arXiv Detail & Related papers (2021-03-24T09:58:02Z) - Conditional Wasserstein GAN-based Oversampling of Tabular Data for
Imbalanced Learning [10.051309746913512]
We propose an oversampling method based on a conditional Wasserstein GAN.
We benchmark our method against standard oversampling methods and the imbalanced baseline on seven real-world datasets.
arXiv Detail & Related papers (2020-08-20T20:33:56Z) - Minority Class Oversampling for Tabular Data with Deep Generative Models [4.976007156860967]
We study the ability of deep generative models to provide realistic samples that improve performance on imbalanced classification tasks via oversampling.
Our experiments show that the way the method of sampling does not affect quality, but runtime varies widely.
We also observe that the improvements in terms of performance metric, while shown to be significant, often are minor in absolute terms.
arXiv Detail & Related papers (2020-05-07T21:35:57Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.