Related papers: Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning

Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning

URL: http://arxiv.org/abs/2312.04398v2
Date: Wed, 29 May 2024 15:54:04 GMT
Title: Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning
Authors: Yongqi Dong, Xingmin Lu, Ruohan Li, Wei Song, Bart van Arem, Haneen Farah,
Abstract summary: This paper transforms lane rendering image anomaly detection into a classification problem. It proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection.
Score: 8.042684255871707
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.

Related papers

Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models. In this paper, we investigate how detection performance varies across model backbones, types, and datasets. We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z)
GTransPDM: A Graph-embedded Transformer with Positional Decoupling for Pedestrian Crossing Intention Prediction [6.327758022051579]
GTransPDM was developed for pedestrian crossing intention prediction by leveraging multi-modal features. It achieves 92% accuracy on the PIE dataset and 87% accuracy on the JAAD dataset, with a processing speed of 0.05ms.
arXiv Detail & Related papers (2024-09-30T12:02:17Z)
Batch-oriented Element-wise Approximate Activation for Privacy-Preserving Neural Networks [5.039738753594332]
Homomorphic Encryption (FHE) is facing a great challenge that homomorphic operations cannot be easily adapted for non-linear activation calculations. Batch-oriented element-wise data packing and approximate activation are proposed, which train linear low-degrees to approximate the non-linear activation function - ReLU. Experiment results show that when ciphertext inference is performed on 4096 input images, compared with the current most efficient channel-wise method, the inference accuracy is improved by 1.65%, and the amortized inference time is reduced by 99.5%.
arXiv Detail & Related papers (2024-03-16T13:26:33Z)
The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling [78.6155095947769]
Skip-Tuning is a simple yet surprisingly effective training-free tuning method on the skip connections. Our method can achieve 100% FID improvement for pretrained EDM on ImageNet 64 with only 19 NFEs (1.75) While Skip-Tuning increases the score-matching losses in the pixel space, the losses in the feature space are reduced.
arXiv Detail & Related papers (2024-02-23T08:05:23Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
Cal-DETR: Calibrated Detection Transformer [67.75361289429013]
We propose a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections.
arXiv Detail & Related papers (2023-11-06T22:13:10Z)
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments. Our approach enhances LiDAR-based detection models using spatial quantized historical features. Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z)
Robust Lane Detection through Self Pre-training with Masked Sequential Autoencoders and Fine-tuning with Customized PolyLoss [0.0]
Lane detection is crucial for vehicle localization which makes it the foundation for automated driving. This paper proposes a pipeline of self-training masked sequential autoencoders and fine-tuning with customized PolyLoss for the end-to-end neural network models. Experiment results show that, with the proposed pipeline, the lane detection model performance can be advanced beyond the state-of-the-art.
arXiv Detail & Related papers (2023-05-26T21:36:08Z)
Detecting Driver Drowsiness as an Anomaly Using LSTM Autoencoders [0.0]
An LSTM autoencoder-based architecture is utilized for drowsiness detection with ResNet-34 as feature extractor. The proposed model achieves detection rate of 0.8740 area under curve (AUC) and is able to provide significant improvements on certain scenarios.
arXiv Detail & Related papers (2022-09-12T14:25:07Z)
GradViT: Gradient Inversion of Vision Transformers [83.54779732309653]
We demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks. We introduce a method, named GradViT, that optimize random noise into naturally looking images. We observe unprecedentedly high fidelity and closeness to the original (hidden) data.
arXiv Detail & Related papers (2022-03-22T17:06:07Z)
Automatic Detection of Rail Components via A Deep Convolutional Transformer Network [7.557470133155959]
We propose a deep convolutional transformer network based method to detect multi-class rail components including the rail, clip, and bolt. Our proposed method simplifies the detection pipeline by eliminating the need of prior settings, such as anchor box, aspect ratio, default coordinates, and post-processing. Results of a comprehensive computational study show that our proposed method outperforms a set of existing state-of-art approaches with large margins.
arXiv Detail & Related papers (2021-08-05T07:38:04Z)
Circumventing Outliers of AutoAugment with Knowledge Distillation [102.25991455094832]
AutoAugment has been a powerful algorithm that improves the accuracy of many vision tasks. This paper delves deep into the working mechanism, and reveals that AutoAugment may remove part of discriminative information from the training image. To relieve the inaccuracy of supervision, we make use of knowledge distillation that refers to the output of a teacher model to guide network training.
arXiv Detail & Related papers (2020-03-25T11:51:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.