Deep Synthetic Minority Over-Sampling Technique
- URL: http://arxiv.org/abs/2003.09788v1
- Date: Sun, 22 Mar 2020 02:44:46 GMT
- Title: Deep Synthetic Minority Over-Sampling Technique
- Authors: Hadi Mansourifar, Weidong Shi
- Abstract summary: We adapt the SMOTE idea in deep learning architecture.
Deep SMOTE can outperform traditional SMOTE in terms of precision, F1 score and Area Under Curve (AUC) in majority of test cases.
- Score: 3.3707422585608953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic Minority Over-sampling Technique (SMOTE) is the most popular
over-sampling method. However, its random nature makes the synthesized data and
even imbalanced classification results unstable. It means that in case of
running SMOTE n different times, n different synthesized in-stances are
obtained with n different classification results. To address this problem, we
adapt the SMOTE idea in deep learning architecture. In this method, a deep
neural network regression model is used to train the inputs and outputs of
traditional SMOTE. Inputs of the proposed deep regression model are two
randomly chosen data points which are concatenated to form a double size
vector. The outputs of this model are corresponding randomly interpolated data
points between two randomly chosen vectors with original dimension. The
experimental results show that, Deep SMOTE can outperform traditional SMOTE in
terms of precision, F1 score and Area Under Curve (AUC) in majority of test
cases.
Related papers
- Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples.
CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling.
We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z) - SMOTified-GAN for class imbalanced pattern classification problems [0.41998444721319217]
We propose a novel two-phase oversampling approach that has the synergy of SMOTE and GAN.
The experimental results prove the sample quality of minority class(es) has been improved in a variety of tested benchmark datasets.
arXiv Detail & Related papers (2021-08-06T06:14:05Z) - A multi-schematic classifier-independent oversampling approach for
imbalanced datasets [0.0]
It is evident from previous studies that different oversampling algorithms have different degrees of efficiency with different classifiers.
Here, we overcome this problem with a multi-schematic and classifier-independent oversampling approach: ProWRAS.
ProWRAS integrates the Localized Random Affine Shadowsampling (LoRAS)algorithm and the Proximity Weighted Synthetic oversampling (ProWSyn) algorithm.
arXiv Detail & Related papers (2021-07-15T14:03:24Z) - ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables [60.799183326613395]
Antithetic REINFORCE-based Multi-Sample gradient estimator.
ARMS uses a copula to generate any number of mutually antithetic samples.
We evaluate ARMS on several datasets for training generative models, and our experimental results show that it outperforms competing methods.
arXiv Detail & Related papers (2021-05-28T23:19:54Z) - GMOTE: Gaussian based minority oversampling technique for imbalanced
classification adapting tail probability of outliers [0.0]
Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE)
In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets.
When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score.
arXiv Detail & Related papers (2021-05-09T07:04:37Z) - SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for
nominal and continuous features [0.38073142980733]
We present a novel minority over-sampling method, SMOTE-ENC (SMOTE - Encoded Nominal and Continuous)
Our experiments show that the classification model using SMOTE-ENC method offers better prediction than model using SMOTE-NC.
Our proposed method addressed one of the major limitations of SMOTE-NC algorithm.
arXiv Detail & Related papers (2021-03-13T04:16:17Z) - The Integrity of Machine Learning Algorithms against Software Defect
Prediction [0.0]
This report analyses the performance of the Online Sequential Extreme Learning Machine (OS-ELM) proposed by Liang et.al.
OS-ELM trains faster than conventional deep neural networks and it always converges to the globally optimal solution.
The analysis is carried out on 3 projects KC1, PC4 and PC3 carried out by the NASA group.
arXiv Detail & Related papers (2020-09-05T17:26:56Z) - RAIN: A Simple Approach for Robust and Accurate Image Classification
Networks [156.09526491791772]
It has been shown that the majority of existing adversarial defense methods achieve robustness at the cost of sacrificing prediction accuracy.
This paper proposes a novel preprocessing framework, which we term Robust and Accurate Image classificatioN(RAIN)
RAIN applies randomization over inputs to break the ties between the model forward prediction path and the backward gradient path, thus improving the model robustness.
We conduct extensive experiments on the STL10 and ImageNet datasets to verify the effectiveness of RAIN against various types of adversarial attacks.
arXiv Detail & Related papers (2020-04-24T02:03:56Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.