Data-level hybrid strategy selection for disk fault prediction model
based on multivariate GAN
- URL: http://arxiv.org/abs/2310.06537v1
- Date: Tue, 10 Oct 2023 11:34:53 GMT
- Title: Data-level hybrid strategy selection for disk fault prediction model
based on multivariate GAN
- Authors: Shuangshuang Yuan, Peng Wu and Yuehui Chen
- Abstract summary: Data class imbalance is a common problem in classification problems, where minority class samples are often more important and more costly to misclassify.
The SMART dataset exhibits an evident class imbalance, comprising a substantial quantity of healthy samples and a comparatively limited number of defective samples.
This dataset serves as a reliable indicator of the disc's health status.
- Score: 7.270429986841776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data class imbalance is a common problem in classification problems, where
minority class samples are often more important and more costly to misclassify
in a classification task. Therefore, it is very important to solve the data
class imbalance classification problem. The SMART dataset exhibits an evident
class imbalance, comprising a substantial quantity of healthy samples and a
comparatively limited number of defective samples. This dataset serves as a
reliable indicator of the disc's health status. In this paper, we obtain the
best balanced disk SMART dataset for a specific classification model by mixing
and integrating the data synthesised by multivariate generative adversarial
networks (GAN) to balance the disk SMART dataset at the data level; and combine
it with genetic algorithms to obtain higher disk fault classification
prediction accuracy on a specific classification model.
Related papers
- Iterative Online Image Synthesis via Diffusion Model for Imbalanced
Classification [29.730360798234294]
We introduce an Iterative Online Image Synthesis framework to address the class imbalance problem in medical image classification.
Our framework incorporates two key modules, namely Online Image Synthesis (OIS) and Accuracy Adaptive Sampling (AAS)
To evaluate the effectiveness of our proposed method in addressing imbalanced classification, we conduct experiments on the HAM10000 and APTOS datasets.
arXiv Detail & Related papers (2024-03-13T10:51:18Z) - A Survey of Methods for Handling Disk Data Imbalance [10.261915886145214]
This paper provides a comprehensive overview of research in the field of imbalanced data classification.
The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance.
arXiv Detail & Related papers (2023-10-13T05:35:13Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Class-Specific Distribution Alignment for Semi-Supervised Medical Image
Classification [14.343079589464994]
Class-Specific Distribution Alignment (CSDA) is a semi-supervised learning framework based on self-training.
We show that our method provides competitive performance on semi-supervised skin disease, thoracic disease, and endoscopic image classification tasks.
arXiv Detail & Related papers (2023-07-29T13:38:19Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes.
We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training.
Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z) - Weighted Least Squares Twin Support Vector Machine with Fuzzy Rough Set
Theory for Imbalanced Data Classification [0.483420384410068]
Support vector machines (SVMs) are powerful supervised learning tools developed to solve classification problems.
We propose an approach that efficiently used fuzzy rough set theory in weighted least squares twin support vector machine called FRLSTSVM for classification of imbalanced data.
arXiv Detail & Related papers (2021-05-03T22:33:39Z) - RA-GCN: Graph Convolutional Network for Disease Prediction Problems with
Imbalanced Data [47.00510780034136]
Class imbalance is a familiar issue in the field of disease prediction.
In this paper, we propose Re-weighted Adversarial Graph Convolutional Network (RA-GCN) to enhance the performance of the graph-based classifier.
We show the superiority of RA-GCN on synthetic and three publicly available medical datasets compared to the recent method.
arXiv Detail & Related papers (2021-02-27T14:06:27Z) - Oversampling Adversarial Network for Class-Imbalanced Fault Diagnosis [12.526197448825968]
Class-imbalance problem requires a robust learning system which can timely predict and classify the data.
We propose a new adversarial network for simultaneous classification and fault detection.
arXiv Detail & Related papers (2020-08-07T10:12:07Z) - Distribution Aligning Refinery of Pseudo-label for Imbalanced
Semi-supervised Learning [126.31716228319902]
We develop Distribution Aligning Refinery of Pseudo-label (DARP) algorithm.
We show that DARP is provably and efficiently compatible with state-of-the-art SSL schemes.
arXiv Detail & Related papers (2020-07-17T09:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.