Noisy Analysis of Quantum SMOTE on Condition Monitoring and Fault Classification in Industrial and Energy Systems
- URL: http://arxiv.org/abs/2601.11423v1
- Date: Fri, 16 Jan 2026 16:44:38 GMT
- Title: Noisy Analysis of Quantum SMOTE on Condition Monitoring and Fault Classification in Industrial and Energy Systems
- Authors: Amit S. Patel, Himanshukumar R. Patel, Bikash K. Behera,
- Abstract summary: Imbalanced machine learning models are a fundamental issue in industrial condition monitoring and fault classification pipelines.<n>This work presents a detailed benchmarking and investigation of classical classifiers under class imbalance mitigation.<n>The results show that QSMOTE consistently corrects distributional skew and significantly enhances the performance of non-linear classifiers.
- Score: 0.5505634045241289
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Imbalanced datasets are a fundamental issue in industrial condition monitoring and fault classification pipelines, causing classical machine learning models to overfit the majority classes while failing to learn the minority fault patterns. This work presents a detailed benchmarking and robustness investigation of classical classifiers under class imbalance mitigation using the Quantum Synthetic Minority Oversampling Technique (QSMOTE) and quantum-inspired perturbations modelled using six noise channels. Four different datasets, the Solar Panel Image Dataset (SPID), the CWRU Bearing Dataset (CWRUBD), the Engine Failure Detection Dataset (EFDD), and the Industrial Fault Detection Dataset (IFDD), are tested across multi-class scenarios to determine the universality of these impacts. The results show that QSMOTE consistently corrects distributional skew and significantly enhances the performance of non-linear classifiers such as Random Forests (RF), Support Vector Machines (SVM), and Decision Trees (DT), yielding improvements of up to 170% on EFDD and achieving near-perfect accuracy ($\geq$0.99) on IFDD. Linear and probabilistic models, such as Linear Regression (LR) and Naive Bayes (NB), produce mixed results, with significant degradation in overlapping feature spaces due to interpolation-induced boundary distortion. A parallel robustness analysis under different noise models reveals that ensemble models (RF) and margin-based learners (SVM) maintain strong resilience, often preserving over 95% of baseline accuracy even under maximum noise. In contrast, NB and DT show substantial instability, especially on high-variance datasets. The findings establish a rigorous baseline for understanding how classical models behave under realistic imbalance and quantum-inspired noise.
Related papers
- Robust Machine Learning for Regulatory Sequence Modeling under Biological and Technical Distribution Shifts [0.3948325938742681]
We introduce a robustness framework to quantify performance degradation, calibration failures, and uncertainty based reliability.<n>In simulation, motif driven regulatory outputs are generated with cell type specific programs, perturbations, GC bias, depth variation, batch effects, and heteroscedastic noise.<n>Models remain accurate but show higher error, severe variance miscalibration, and coverage collapse under motif effect rewiring and noise dominated regimes.
arXiv Detail & Related papers (2026-01-21T13:15:27Z) - Robustness of Probabilistic Models to Low-Quality Data: A Multi-Perspective Analysis [23.834741751854448]
A systematic, comparative investigation into the effects of low-quality data reveals a stark spectrum of robustness across modern probabilistic models.<n>We find that autoregressive language models, from token prediction to sequence-to-sequence tasks, are remarkably resilient.<n>Under the same levels of data corruption, class-conditional diffusion models degrade catastrophically.
arXiv Detail & Related papers (2025-12-11T02:10:41Z) - Dual-granularity Sinkhorn Distillation for Enhanced Learning from Long-tailed Noisy Data [67.25796812343454]
Real-world datasets for deep learning frequently suffer from the co-occurring challenges of class imbalance and label noise.<n>We propose Dual-granularity Sinkhorn Distillation (D-SINK), a novel framework that enhances dual robustness by distilling and integrating complementary insights.<n>Experiments on benchmark datasets demonstrate that D-SINK significantly improves robustness and achieves strong empirical performance in learning from long-tailed noisy data.
arXiv Detail & Related papers (2025-10-09T13:05:27Z) - Learning Robust Diffusion Models from Imprecise Supervision [75.53546939251146]
DMIS is a unified framework for training robust Conditional Diffusion Models from Imprecise Supervision.<n>Our framework is derived from likelihood and decomposes the objective into generative and classification components.<n>Experiments on diverse forms of imprecise supervision, covering tasks covering image generation, weakly supervised learning, and dataset condensation demonstrate that DMIS consistently produces high-quality and class-discriminative samples.
arXiv Detail & Related papers (2025-10-03T14:00:32Z) - Physics-Informed Multimodal Bearing Fault Classification under Variable Operating Conditions using Transfer Learning [0.46085106405479537]
This study proposes a physics-informed multimodal convolutional neural network (CNN) with a late fusion architecture.<n>The model incorporates a novel physics-informed loss function that penalizes physically implausible predictions.<n>Experiments on the Paderborn University dataset demonstrate that the proposed physics-informed approach consistently outperforms a non-physics-informed baseline.
arXiv Detail & Related papers (2025-08-11T01:32:09Z) - Enhancing Crash Frequency Modeling Based on Augmented Multi-Type Data by Hybrid VAE-Diffusion-Based Generative Neural Networks [25.772405506451204]
A key challenge in crash frequency modelling is the prevalence of excessive zero observations.<n>We propose a hybrid VAE-Diffusion neural network, designed to reduce zero observations.<n>We assess the synthetic data quality generated by this model through metrics like similarity, accuracy, diversity, and structural consistency.
arXiv Detail & Related papers (2025-01-17T07:53:27Z) - A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - From Environmental Sound Representation to Robustness of 2D CNN Models
Against Adversarial Attacks [82.21746840893658]
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
We show that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary.
arXiv Detail & Related papers (2022-04-14T15:14:08Z) - Fault Detection and Diagnosis with Imbalanced and Noisy Data: A Hybrid
Framework for Rotating Machinery [2.580765958706854]
Fault diagnosis plays an essential role in reducing the maintenance costs of rotating machinery manufacturing systems.
Traditional Fault Detection and Diagnosis (FDD) frameworks get poor performances when dealing with real-world circumstances.
This paper proposes a hybrid framework which uses the three aforementioned components to achieve an effective signal-based FDD system.
arXiv Detail & Related papers (2022-02-09T01:09:59Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.