SCLIFD:Supervised Contrastive Knowledge Distillation for Incremental
Fault Diagnosis under Limited Fault Data
- URL: http://arxiv.org/abs/2302.05929v1
- Date: Sun, 12 Feb 2023 14:50:12 GMT
- Title: SCLIFD:Supervised Contrastive Knowledge Distillation for Incremental
Fault Diagnosis under Limited Fault Data
- Authors: Peng Peng, Hanrong Zhang, Mengxuan Li, Gongzhuang Peng, Hongwei Wang,
Weiming Shen
- Abstract summary: It is difficult to extract discriminative features from limited fault data.
A well-trained model must be retrained from scratch to classify the samples from new classes.
The model decision is biased toward the new classes due to the class imbalance.
- Score: 8.354404360859263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligent fault diagnosis has made extraordinary advancements currently.
Nonetheless, few works tackle class-incremental learning for fault diagnosis
under limited fault data, i.e., imbalanced and long-tailed fault diagnosis,
which brings about various notable challenges. Initially, it is difficult to
extract discriminative features from limited fault data. Moreover, a
well-trained model must be retrained from scratch to classify the samples from
new classes, thus causing a high computational burden and time consumption.
Furthermore, the model may suffer from catastrophic forgetting when trained
incrementally. Finally, the model decision is biased toward the new classes due
to the class imbalance. The problems can consequently lead to performance
degradation of fault diagnosis models. Accordingly, we introduce a supervised
contrastive knowledge distillation for incremental fault diagnosis under
limited fault data (SCLIFD) framework to address these issues, which extends
the classical incremental classifier and representation learning (iCaRL)
framework from three perspectives. Primarily, we adopt supervised contrastive
knowledge distillation (KD) to enhance its representation learning capability
under limited fault data. Moreover, we propose a novel prioritized exemplar
selection method adaptive herding (AdaHerding) to restrict the increase of the
computational burden, which is also combined with KD to alleviate catastrophic
forgetting. Additionally, we adopt the cosine classifier to mitigate the
adverse impact of class imbalance. We conduct extensive experiments on
simulated and real-world industrial processes under different imbalance ratios.
Experimental results show that our SCLIFD outperforms the existing methods by a
large margin.
Related papers
- Long-tailed Medical Diagnosis with Relation-aware Representation Learning and Iterative Classifier Calibration [14.556686415877602]
We propose a new Long-tailed Medical Diagnosis (LMD) framework for balanced medical image classification on long-tailed datasets.
Our framework significantly surpasses state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-05T14:57:23Z) - Class Incremental Fault Diagnosis under Limited Fault Data via Supervised Contrastive Knowledge Distillation [9.560742599396411]
Class-incremental fault diagnosis requires a model to adapt to new fault classes while retaining previous knowledge.
We introduce a Supervised Contrastive knowledge distiLlation for class Incremental Fault Diagnosis framework.
arXiv Detail & Related papers (2025-01-16T13:20:29Z) - Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - SRTFD: Scalable Real-Time Fault Diagnosis through Online Continual Learning [8.016378373626084]
Modern industrial environments demand FD methods that can handle new fault types, dynamic conditions, large-scale data, and provide real-time responses with minimal prior information.
We propose SRTFD, a scalable real-time fault diagnosis framework that enhances online continual learning (OCL) with three critical methods.
Experiments on a real-world dataset and two public simulated datasets demonstrate SRTFD's effectiveness and potential for providing advanced, scalable, and precise fault diagnosis in modern industrial systems.
arXiv Detail & Related papers (2024-08-11T03:26:22Z) - Error Detection and Constraint Recovery in Hierarchical Multi-Label Classification without Prior Knowledge [2.0007789979629784]
We present an approach based on Error Detection Rules (EDR) that allow for learning explainable rules about the failure modes of machine learning models.
We show that our approach is effective in detecting machine learning errors and recovering constraints, is noise tolerant, and can function as a source of knowledge for neurosymbolic models on multiple datasets.
arXiv Detail & Related papers (2024-07-21T15:12:19Z) - Multi-Granularity Semantic Revision for Large Language Model Distillation [66.03746866578274]
We propose a multi-granularity semantic revision method for LLM distillation.
At the sequence level, we propose a sequence correction and re-generation strategy.
At the token level, we design a distribution adaptive clipping Kullback-Leibler loss as the distillation objective function.
At the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent.
arXiv Detail & Related papers (2024-07-14T03:51:49Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Causal Disentanglement Hidden Markov Model for Fault Diagnosis [55.90917958154425]
We propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism.
Specifically, we make full use of the time-series data and progressively disentangle the vibration signal into fault-relevant and fault-irrelevant factors.
To expand the scope of the application, we adopt unsupervised domain adaptation to transfer the learned disentangled representations to other working environments.
arXiv Detail & Related papers (2023-08-06T05:58:45Z) - Semantic Latent Space Regression of Diffusion Autoencoders for Vertebral
Fracture Grading [72.45699658852304]
This paper proposes a novel approach to train a generative Diffusion Autoencoder model as an unsupervised feature extractor.
We model fracture grading as a continuous regression, which is more reflective of the smooth progression of fractures.
Importantly, the generative nature of our method allows us to visualize different grades of a given vertebra, providing interpretability and insight into the features that contribute to automated grading.
arXiv Detail & Related papers (2023-03-21T17:16:01Z) - A New Knowledge Distillation Network for Incremental Few-Shot Surface
Defect Detection [20.712532953953808]
This paper proposes a new knowledge distillation network, called Dual Knowledge Align Network (DKAN)
The proposed DKAN method follows a pretraining-finetuning transfer learning paradigm and a knowledge distillation framework is designed for fine-tuning.
Experiments have been conducted on the incremental Few-shot NEU-DET dataset and results show that DKAN outperforms other methods on various few-shot scenes.
arXiv Detail & Related papers (2022-09-01T15:08:44Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.