Efficient Vision Transformer for Accurate Traffic Sign Detection
- URL: http://arxiv.org/abs/2311.01429v1
- Date: Thu, 2 Nov 2023 17:44:32 GMT
- Title: Efficient Vision Transformer for Accurate Traffic Sign Detection
- Authors: Javad Mirzapour Kaleybar, Hooman Khaloo, Avaz Naghipour
- Abstract summary: This research paper addresses the challenges associated with traffic sign detection in self-driving vehicles and driver assistance systems.
It introduces the application of the Transformer model, particularly the Vision Transformer variants, to tackle this task.
To enhance the efficiency of the Transformer model, the research proposes a novel strategy that integrates a locality inductive bias and a transformer module.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This research paper addresses the challenges associated with traffic sign
detection in self-driving vehicles and driver assistance systems. The
development of reliable and highly accurate algorithms is crucial for the
widespread adoption of traffic sign recognition and detection (TSRD) in diverse
real-life scenarios. However, this task is complicated by suboptimal traffic
images affected by factors such as camera movement, adverse weather conditions,
and inadequate lighting. This study specifically focuses on traffic sign
detection methods and introduces the application of the Transformer model,
particularly the Vision Transformer variants, to tackle this task. The
Transformer's attention mechanism, originally designed for natural language
processing, offers improved parallel efficiency. Vision Transformers have
demonstrated success in various domains, including autonomous driving, object
detection, healthcare, and defense-related applications. To enhance the
efficiency of the Transformer model, the research proposes a novel strategy
that integrates a locality inductive bias and a transformer module. This
includes the introduction of the Efficient Convolution Block and the Local
Transformer Block, which effectively capture short-term and long-term
dependency information, thereby improving both detection speed and accuracy.
Experimental evaluations demonstrate the significant advancements achieved by
this approach, particularly when applied to the GTSDB dataset.
Related papers
- GTransPDM: A Graph-embedded Transformer with Positional Decoupling for Pedestrian Crossing Intention Prediction [6.327758022051579]
GTransPDM was developed for pedestrian crossing intention prediction by leveraging multi-modal features.
It achieves 92% accuracy on the PIE dataset and 87% accuracy on the JAAD dataset, with a processing speed of 0.05ms.
arXiv Detail & Related papers (2024-09-30T12:02:17Z) - Object Detection using Oriented Window Learning Vi-sion Transformer: Roadway Assets Recognition [4.465427147188149]
The Oriented Window Learning Vision Transformer (OWL-ViT) offers a novel approach by adapting window orientations to the geometry and existence of objects.
This study leverages OWL-ViT within a one-shot learning framework to recognize transportation infrastructure components, such as traffic signs, poles, pavement, and cracks.
arXiv Detail & Related papers (2024-06-15T18:49:42Z) - Revolutionizing Traffic Sign Recognition: Unveiling the Potential of Vision Transformers [0.0]
Traffic Sign Recognition (TSR) holds a vital role in advancing driver assistance systems and autonomous vehicles.
This study explores three variants of Vision Transformers (PVT, TNT, LNL) and six convolutional neural networks (AlexNet, ResNet, VGG16, MobileNet, EfficientNet, GoogleNet) as baseline models.
To address the shortcomings of traditional methods, a novel pyramid EATFormer backbone is proposed, amalgamating Evolutionary Algorithms (EAs) with the Transformer architecture.
arXiv Detail & Related papers (2024-04-29T19:18:52Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - Vision Transformers for Action Recognition: A Survey [41.69370782177517]
Vision transformers are emerging as a powerful tool to solve computer vision problems.
Recent techniques have proven the efficacy of transformers beyond the image domain to solve numerous video-related tasks.
Human action recognition is receiving special attention from the research community due to its widespread applications.
arXiv Detail & Related papers (2022-09-13T02:57:05Z) - Pyramid Transformer for Traffic Sign Detection [1.933681537640272]
A novel Pyramid Transformer with locality mechanisms is proposed in this paper.
Specifically, Pyramid Transformer has several spatial pyramid reduction layers to shrink and embed the input image into tokens with rich multi-scale context.
The experiments are conducted on the German Traffic Sign Detection Benchmark (GTSDB)
arXiv Detail & Related papers (2022-07-13T09:21:19Z) - Learning energy-efficient driving behaviors by imitating experts [75.12960180185105]
This paper examines the role of imitation learning in bridging the gap between control strategies and realistic limitations in communication and sensing.
We show that imitation learning can succeed in deriving policies that, if adopted by 5% of vehicles, may boost the energy-efficiency of networks with varying traffic conditions by 15% using only local observations.
arXiv Detail & Related papers (2022-06-28T17:08:31Z) - XAI for Transformers: Better Explanations through Conservative
Propagation [60.67748036747221]
We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction.
Our proposal can be seen as a proper extension of the well-established LRP method to Transformers.
arXiv Detail & Related papers (2022-02-15T10:47:11Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation [77.09542018140823]
We propose a neural network approach with a graph transformer backbone, namely TransCamP, to address the camera relocalization problem.
TransCamP effectively fuses the image features, camera pose information and inter-frame relative camera motions into encoded graph attributes.
arXiv Detail & Related papers (2021-05-28T19:08:43Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.