Transferring self-supervised pre-trained models for SHM data anomaly detection with scarce labeled data
- URL: http://arxiv.org/abs/2412.03880v1
- Date: Thu, 05 Dec 2024 05:25:30 GMT
- Title: Transferring self-supervised pre-trained models for SHM data anomaly detection with scarce labeled data
- Authors: Mingyuan Zhou, Xudong Jian, Ye Xia, Zhilu Lai,
- Abstract summary: Self-supervised learning (SSL) is an emerging paradigm that combines unsupervised pre-training and supervised fine-tuning.
SSL techniques boost data anomaly detection performance, achieving increased F1 scores compared to conventional supervised training.
This work manifests the effectiveness and superiority of SSL techniques on large-scale SHM data, providing an efficient tool for preliminary anomaly detection with scarce label information.
- Score: 45.031249077732745
- License:
- Abstract: Structural health monitoring (SHM) has experienced significant advancements in recent decades, accumulating massive monitoring data. Data anomalies inevitably exist in monitoring data, posing significant challenges to their effective utilization. Recently, deep learning has emerged as an efficient and effective approach for anomaly detection in bridge SHM. Despite its progress, many deep learning models require large amounts of labeled data for training. The process of labeling data, however, is labor-intensive, time-consuming, and often impractical for large-scale SHM datasets. To address these challenges, this work explores the use of self-supervised learning (SSL), an emerging paradigm that combines unsupervised pre-training and supervised fine-tuning. The SSL-based framework aims to learn from only a very small quantity of labeled data by fine-tuning, while making the best use of the vast amount of unlabeled SHM data by pre-training. Mainstream SSL methods are compared and validated on the SHM data of two in-service bridges. Comparative analysis demonstrates that SSL techniques boost data anomaly detection performance, achieving increased F1 scores compared to conventional supervised training, especially given a very limited amount of labeled data. This work manifests the effectiveness and superiority of SSL techniques on large-scale SHM data, providing an efficient tool for preliminary anomaly detection with scarce label information.
Related papers
- Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data [35.431340001608476]
Traditional data mining methods are inadequate when faced with large-scale, high-dimensional and complex data.
This study introduces semi-supervised learning methods, aiming to improve the algorithm's ability to utilize unlabeled data.
Specifically, we adopt a self-training method and combine it with a convolutional neural network (CNN) for image feature extraction and classification.
arXiv Detail & Related papers (2024-11-27T18:59:50Z) - A Survey of the Self Supervised Learning Mechanisms for Vision Transformers [5.152455218955949]
The application of self supervised learning (SSL) in vision tasks has gained significant attention.
We develop a comprehensive taxonomy of systematically classifying the SSL techniques.
We discuss the motivations behind SSL, review popular pre-training tasks, and highlight the challenges and advancements in this field.
arXiv Detail & Related papers (2024-08-30T07:38:28Z) - SPEED: Scalable Preprocessing of EEG Data for Self-Supervised Learning [2.705542761685457]
We propose a Python-based EEG preprocessing pipeline optimized for self-supervised learning.
This optimization stabilizes self-supervised training and enhances performance on downstream tasks.
arXiv Detail & Related papers (2024-08-15T10:15:01Z) - On Pretraining Data Diversity for Self-Supervised Learning [57.91495006862553]
We explore the impact of training with more diverse datasets on the performance of self-supervised learning (SSL) under a fixed computational budget.
Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal.
arXiv Detail & Related papers (2024-03-20T17:59:58Z) - Self-Supervised Learning for Data Scarcity in a Fatigue Damage
Prognostic Problem [0.0]
Self-Supervised Learning is a sub-category of unsupervised learning approaches.
This paper investigates whether pre-training DL models in a self-supervised way on unlabelled sensors data can be useful for Remaining Useful Life (RUL) estimation.
Results show that the self-supervised pre-trained models are able to significantly outperform the non-pre-trained models in downstream RUL prediction task.
arXiv Detail & Related papers (2023-01-20T06:45:32Z) - MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL)
We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately.
Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z) - Robust Learning of Deep Time Series Anomaly Detection Models with
Contaminated Training Data [29.808942473293108]
Time series anomaly detection (TSAD) is an important data mining task with numerous applications in the IoT era.
Deep TSAD methods typically rely on a clean training dataset that is not polluted by anomalies to learn the "normal profile" of the underlying dynamics.
We propose a model-agnostic method which can effectively improve the robustness of learning mainstream deep TSAD models with potentially contaminated data.
arXiv Detail & Related papers (2022-08-03T04:52:08Z) - Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of
Semi-Supervised Learning and Active Learning [60.26659373318915]
Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem.
We propose an innovative Inconsistency-based virtual aDvErial algorithm to further investigate SSL-AL's potential superiority.
Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.
arXiv Detail & Related papers (2022-06-07T13:28:43Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning.
It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model.
It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.