UGRWO-Sampling for COVID-19 dataset: A modified random walk
under-sampling approach based on graphs to imbalanced data classification
- URL: http://arxiv.org/abs/2002.03521v3
- Date: Thu, 2 Dec 2021 20:43:13 GMT
- Title: UGRWO-Sampling for COVID-19 dataset: A modified random walk
under-sampling approach based on graphs to imbalanced data classification
- Authors: Saeideh Roshanfekr, Shahriar Esmaeili, Hassan Ataeian, and Ali Amiri
- Abstract summary: This paper proposes a new RWO-Sampling (Random Walk Over-Sampling) based on graphs for imbalanced datasets.
Two schemes based on under-sampling and over-sampling methods are introduced to keep the proximity information robust to noises and outliers.
- Score: 2.15242029196761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a new RWO-Sampling (Random Walk Over-Sampling) based on
graphs for imbalanced datasets. In this method, two schemes based on
under-sampling and over-sampling methods are introduced to keep the proximity
information robust to noises and outliers. After constructing the first graph
on minority class, RWO-Sampling will be implemented on selected samples, and
the rest will remain unchanged. The second graph is constructed for the
majority class, and the samples in a low-density area (outliers) are removed.
Finally, in the proposed method, samples of the majority class in a
high-density area are selected, and the rest are eliminated. Furthermore,
utilizing RWO-sampling, the boundary of minority class is increased though the
outliers are not raised. This method is tested, and the number of evaluation
measures is compared to previous methods on nine continuous attribute datasets
with different over-sampling rates and one data set for the diagnosis of
COVID-19 disease. The experimental results indicated the high efficiency and
flexibility of the proposed method for the classification of imbalanced data
Related papers
- A Bilevel Optimization Framework for Imbalanced Data Classification [1.6385815610837167]
We propose a new undersampling approach that avoids the pitfalls of noise and overlap caused by synthetic data.
Instead of undersampling majority data randomly, our method undersamples datapoints based on their ability to improve model loss.
Using improved model loss as a proxy measurement for classification performance, our technique assesses a datapoint's impact on loss and rejects those unable to improve it.
arXiv Detail & Related papers (2024-10-15T01:17:23Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Efficient Hybrid Oversampling and Intelligent Undersampling for
Imbalanced Big Data Classification [1.03590082373586]
We present a novel resampling method called SMOTENN that combines intelligent undersampling and oversampling using a MapReduce framework.
Our experimental results show the virtues of this approach, outperforming alternative resampling techniques for small- and medium-sized datasets.
arXiv Detail & Related papers (2023-10-09T15:22:13Z) - BSGAN: A Novel Oversampling Technique for Imbalanced Pattern
Recognitions [0.0]
Class imbalanced problems (CIP) are one of the potential challenges in developing unbiased Machine Learning (ML) models for predictions.
CIP occurs when data samples are not equally distributed between the two or multiple classes.
We propose a hybrid oversampling technique by combining the power of borderline SMOTE and Generative Adrial Network to generate more diverse data.
arXiv Detail & Related papers (2023-05-16T20:02:39Z) - A Static Analysis of Informed Down-Samples [62.997667081978825]
We study recorded populations from the first generation of genetic programming runs, as well as entirely synthetic populations.
We show that both forms of down-sampling cause greater test coverage loss than standard lexicase selection with no down-sampling.
arXiv Detail & Related papers (2023-04-04T17:34:48Z) - Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking.
Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - GMOTE: Gaussian based minority oversampling technique for imbalanced
classification adapting tail probability of outliers [0.0]
Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE)
In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets.
When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score.
arXiv Detail & Related papers (2021-05-09T07:04:37Z) - Gamma distribution-based sampling for imbalanced data [6.85316573653194]
Imbalanced class distribution is a common problem in a number of fields including medical diagnostics, fraud detection, and others.
We propose a novel method for balancing the class distribution in data through intelligent resampling of the minority class instances.
arXiv Detail & Related papers (2020-09-22T06:39:13Z) - Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework.
We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.