Related papers: TL-CLIP: A Power-specific Multimodal Pre-trained Visual Foundation Model for Transmission Line Defect Recognition

TL-CLIP: A Power-specific Multimodal Pre-trained Visual Foundation Model for Transmission Line Defect Recognition

URL: http://arxiv.org/abs/2411.11370v1
Date: Mon, 18 Nov 2024 08:32:51 GMT
Title: TL-CLIP: A Power-specific Multimodal Pre-trained Visual Foundation Model for Transmission Line Defect Recognition
Authors: Ke Zhang, Zhaoye Zheng, Yurong Guo, Jiacun Wang, Jiyuan Yang, Yangjie Xiao,
Abstract summary: We propose a two-stage transmission-line-oriented contrastive language-image pre-training (TL-CLIP) framework for transmission line defect recognition. The pre-training process employs a novel power-specific multimodal algorithm assisted with two power-specific pre-training tasks for better modeling the power-related semantic knowledge. Experimental results demonstrate that the proposed method significantly improves the performance of transmission line defect recognition in both classification and detection tasks.
Score: 3.361647807059187
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transmission line defect recognition models have traditionally used general pre-trained weights as the initial basis for their training. These models often suffer weak generalization capability due to the lack of domain knowledge in the pre-training dataset. To address this issue, we propose a two-stage transmission-line-oriented contrastive language-image pre-training (TL-CLIP) framework, which lays a more effective foundation for transmission line defect recognition. The pre-training process employs a novel power-specific multimodal algorithm assisted with two power-specific pre-training tasks for better modeling the power-related semantic knowledge contained in the inspection data. To fine-tune the pre-trained model, we develop a transfer learning strategy, namely fine-tuning with pre-training objective (FTP), to alleviate the overfitting problem caused by limited inspection data. Experimental results demonstrate that the proposed method significantly improves the performance of transmission line defect recognition in both classification and detection tasks, indicating clear advantages over traditional pre-trained models in the scene of transmission line inspection.

Related papers

Leveraging Self-Supervised Features for Efficient Flooded Region Identification in UAV Aerial Images [0.0]
This work focuses on leveraging self-supervised features to accurately identify flooded regions in UAV aerial images.<n>We evaluate the effectiveness of features from the DINOv2 model, trained on non-aerial images, for segmenting aerial images.
arXiv Detail & Related papers (2025-07-07T12:02:35Z)
SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model [52.47816604709358]
Video anomaly detection (VAD) aims to identify unexpected events in videos and has wide applications in safety-critical domains.<n> vision-language models (VLMs) have demonstrated strong multimodal reasoning capabilities, offering new opportunities for anomaly detection.<n>We propose SlowFastVAD, a hybrid framework that integrates a fast anomaly detector with a slow anomaly detector.
arXiv Detail & Related papers (2025-04-14T15:30:03Z)
Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAVTarget Detection [18.776245480405958]
Infrared unmanned aerial vehicle (UAV) images captured using thermal detectors are often affected by temperature dependent lowfrequency nonuniformity.<n>We present a detection-friendly union framework that simultaneously addresses both infrared and UAV target detection tasks.<n>We introduce a detection-guided self-supervised loss to reduce feature discrepancies between the two tasks, thereby enhancing detection robustness to varying nonuniformity levels.
arXiv Detail & Related papers (2025-04-05T01:29:22Z)
Patch-aware Vector Quantized Codebook Learning for Unsupervised Visual Defect Detection [4.081433571732692]
Unsupervised visual defect detection is critical in industrial applications.<n>We propose a novel approach using an enhanced VQ-VAE framework optimized for unsupervised defect detection.
arXiv Detail & Related papers (2025-01-15T22:26:26Z)
Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z)
Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector [97.92369017531038]
We build a new laRge-scale Adervsarial images dataset with Diverse hArmful Responses (RADAR) We then develop a novel iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of Visual Language Models (VLMs) to achieve the detection of adversarial images against benign ones in the input.
arXiv Detail & Related papers (2024-10-30T10:33:10Z)
Adaptive Data Transport Mechanism for UAV Surveillance Missions in Lossy Environments [2.700610024690147]
Unmanned Aerial Vehicles (UAVs) play an increasingly critical role in Intelligence, Surveillance, and Reconnaissance (ISR) missions. This paper offers an alternative AI-driven scheduling policy that prioritizes selecting regions of the image that significantly contributes to the mission objective.
arXiv Detail & Related papers (2024-09-30T18:22:58Z)
Non-asymptotic Convergence of Training Transformers for Next-token Prediction [48.9399496805422]
Transformers have achieved extraordinary success in modern machine learning due to their excellent ability to handle sequential data. This paper provides a fine-grained non-asymptotic analysis of the training dynamics of a one-layer transformer. We show that the trained transformer presents non-token prediction ability with dataset shift.
arXiv Detail & Related papers (2024-09-25T20:22:06Z)
Novel Saliency Analysis for the Forward Forward Algorithm [0.0]
We introduce the Forward Forward algorithm into neural network training. This method involves executing two forward passes the first with actual data to promote positive reinforcement, and the second with synthetically generated negative data to enable discriminative learning. To overcome the limitations inherent in traditional saliency techniques, we developed a bespoke saliency algorithm specifically tailored for the Forward Forward framework.
arXiv Detail & Related papers (2024-09-18T17:21:59Z)
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback. Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data. We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
Self-Supervised Dual Contouring [30.9409064656302]
We propose a self-supervised training scheme for the Neural Dual Contouring meshing framework. We use two novel self-supervised loss functions that encourage consistency between distances to the generated mesh. We demonstrate that our self-supervised losses improve meshing performance in the single-view reconstruction task.
arXiv Detail & Related papers (2024-05-28T12:44:28Z)
Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm [87.47506806135746]
In some applications, edge learning is experiencing a shift in focusing from conventional learning from scratch to new two-stage learning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. It is shown that the proposed joint resource management over the pre-training and fine-tuning stages well balances the system performance trade-off.
arXiv Detail & Related papers (2024-04-01T00:21:11Z)
An Emulator for Fine-Tuning Large Language Models using Small Language Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales. We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z)
ADFA: Attention-augmented Differentiable top-k Feature Adaptation for Unsupervised Medical Anomaly Detection [5.946143723117816]
We propose Attention-Augmented Differentiable top-k Feature Adaptation (ADFA) for medical image anomaly detection. WR50 network pre-trained on ImageNet to extract initial feature representations. We then apply differentiable top-k feature adaptation to train the patch descriptor, mapping the extracted feature representations to a new vector space.
arXiv Detail & Related papers (2023-08-29T13:10:53Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
ReDFeat: Recoupling Detection and Description for Multimodal Feature Learning [51.07496081296863]
We recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy. We propose a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers. We build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks.
arXiv Detail & Related papers (2022-05-16T04:24:22Z)
Transfer Learning for Fault Diagnosis of Transmission Lines [55.971052290285485]
A novel transfer learning framework based on a pre-trained LeNet-5 convolutional neural network is proposed. It is able to diagnose faults for different transmission line lengths and impedances by transferring the knowledge from a source neural network to predict a dissimilar target dataset.
arXiv Detail & Related papers (2022-01-20T06:36:35Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
Perceiving Traffic from Aerial Images [86.994032967469]
We propose an object detection method called Butterfly Detector that is tailored to detect objects in aerial images. We evaluate our Butterfly Detector on two publicly available UAV datasets (UAVDT and VisDrone 2019) and show that it outperforms previous state-of-the-art methods while remaining real-time.
arXiv Detail & Related papers (2020-09-16T11:37:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.