GAN based Data Augmentation to Resolve Class Imbalance
- URL: http://arxiv.org/abs/2206.05840v1
- Date: Sun, 12 Jun 2022 21:21:55 GMT
- Title: GAN based Data Augmentation to Resolve Class Imbalance
- Authors: Sairamvinay Vijayaraghavan, Terry Guan, Jason (Jinxiao) Song
- Abstract summary: In many related tasks, the datasets have a very small number of observed fraud cases.
This imbalance presence may impact any learning model's behavior by predicting all labels as the majority class.
We trained Generative Adversarial Network(GAN) to generate a large number of convincing (and reliable) synthetic examples of the minority class.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The number of credit card fraud has been growing as technology grows and
people can take advantage of it. Therefore, it is very important to implement a
robust and effective method to detect such frauds. The machine learning
algorithms are appropriate for these tasks since they try to maximize the
accuracy of predictions and hence can be relied upon. However, there is an
impending flaw where in machine learning models may not perform well due to the
presence of an imbalance across classes distribution within the sample set. So,
in many related tasks, the datasets have a very small number of observed fraud
cases (sometimes around 1 percent positive fraud instances found). Therefore,
this imbalance presence may impact any learning model's behavior by predicting
all labels as the majority class, hence allowing no scope for generalization in
the predictions made by the model. We trained Generative Adversarial
Network(GAN) to generate a large number of convincing (and reliable) synthetic
examples of the minority class that can be used to alleviate the class
imbalance within the training set and hence generalize the learning of the data
more effectively.
Related papers
- Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others.
We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data.
Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Empirical study of Machine Learning Classifier Evaluation Metrics
behavior in Massively Imbalanced and Noisy data [0.0]
We develop a theoretical foundation to model human annotation errors and extreme imbalance typical in real world fraud detection data sets.
We demonstrate that a combined F1 score and g-mean, in that specific order, is the best evaluation metric for typical imbalanced fraud detection model classification.
arXiv Detail & Related papers (2022-08-25T07:30:31Z) - Throwing Away Data Improves Worst-Class Error in Imbalanced
Classification [36.91428748713018]
Class imbalances pervade classification problems, yet their treatment differs in theory and practice.
We take on the challenge of developing learning theory able to describe the worst-class error of classifiers over linearly-separable data.
arXiv Detail & Related papers (2022-05-23T23:43:18Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Imbalanced Image Classification with Complement Cross Entropy [10.35173901214638]
We study the study of cross entropy which mostly ignores output scores on incorrect classes.
This work discovers that predicted probabilities on incorrect classes improves the prediction accuracy for imbalanced image classification.
The proposed loss makes the ground truth class overwhelm the other classes in terms of softmax probability.
arXiv Detail & Related papers (2020-09-04T13:46:24Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Imbalanced Data Learning by Minority Class Augmentation using Capsule
Adversarial Networks [31.073558420480964]
We propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods.
In our model, generative and discriminative networks play a novel competitive game.
The coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.
arXiv Detail & Related papers (2020-04-05T12:36:06Z) - VaB-AL: Incorporating Class Imbalance and Difficulty with Variational
Bayes for Active Learning [38.33920705605981]
We propose a method that can naturally incorporate class imbalance into the Active Learning framework.
We show that our method can be applied to tasks classification on multiple different datasets.
arXiv Detail & Related papers (2020-03-25T07:34:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.