Pyramid Transformer for Traffic Sign Detection
- URL: http://arxiv.org/abs/2207.06067v1
- Date: Wed, 13 Jul 2022 09:21:19 GMT
- Title: Pyramid Transformer for Traffic Sign Detection
- Authors: Omid Nejati Manzari, Amin Boudesh, Shahriar B. Shokouhi
- Abstract summary: A novel Pyramid Transformer with locality mechanisms is proposed in this paper.
Specifically, Pyramid Transformer has several spatial pyramid reduction layers to shrink and embed the input image into tokens with rich multi-scale context.
The experiments are conducted on the German Traffic Sign Detection Benchmark (GTSDB)
- Score: 1.933681537640272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traffic sign detection is a vital task in the visual system of self-driving
cars and the automated driving system. Recently, novel Transformer-based models
have achieved encouraging results for various computer vision tasks. We still
observed that vanilla ViT could not yield satisfactory results in traffic sign
detection because the overall size of the datasets is very small and the class
distribution of traffic signs is extremely unbalanced. To overcome this
problem, a novel Pyramid Transformer with locality mechanisms is proposed in
this paper. Specifically, Pyramid Transformer has several spatial pyramid
reduction layers to shrink and embed the input image into tokens with rich
multi-scale context by using atrous convolutions. Moreover, it inherits an
intrinsic scale invariance inductive bias and is able to learn local feature
representation for objects at various scales, thereby enhancing the network
robustness against the size discrepancy of traffic signs. The experiments are
conducted on the German Traffic Sign Detection Benchmark (GTSDB). The results
demonstrate the superiority of the proposed model in the traffic sign detection
tasks. More specifically, Pyramid Transformer achieves 75.6% mAP in GTSDB when
applied to the Cascade RCNN as the backbone and surpassing most well-known and
widely used SOTAs.
Related papers
- Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition [49.20086587208214]
We propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition.
By using description texts, our method reduces the cross-domain differences between template and real traffic signs.
Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels.
arXiv Detail & Related papers (2024-07-08T10:51:03Z) - Traffic Sign Recognition Using Local Vision Transformer [1.8416014644193066]
This paper proposes a new novel model that blends the advantages of both convolutional and transformer-based networks for traffic sign recognition.
The proposed model includes convolutional blocks for capturing local correlations and transformer-based blocks for learning global dependencies.
The experimental evaluations demonstrate that the hybrid network with the locality module outperforms pure transformer-based models and some of the best convolutional networks in accuracy.
arXiv Detail & Related papers (2023-11-11T19:42:41Z) - Efficient Vision Transformer for Accurate Traffic Sign Detection [0.0]
This research paper addresses the challenges associated with traffic sign detection in self-driving vehicles and driver assistance systems.
It introduces the application of the Transformer model, particularly the Vision Transformer variants, to tackle this task.
To enhance the efficiency of the Transformer model, the research proposes a novel strategy that integrates a locality inductive bias and a transformer module.
arXiv Detail & Related papers (2023-11-02T17:44:32Z) - Distinguishing a planetary transit from false positives: a
Transformer-based classification for planetary transit signals [2.2530415657791036]
We present a new architecture for the automatic classification of transit signals.
Our proposed architecture is designed to capture the most significant features of a transit signal and stellar parameters.
We show that our architecture achieves competitive results concerning the CNNs applied for recognizing exoplanetary transit signals.
arXiv Detail & Related papers (2023-04-27T15:43:25Z) - An Extendable, Efficient and Effective Transformer-based Object Detector [95.06044204961009]
We integrate Vision and Detection Transformers (ViDT) to construct an effective and efficient object detector.
ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector.
We extend it to ViDT+ to support joint-task learning for object detection and instance segmentation.
arXiv Detail & Related papers (2022-04-17T09:27:45Z) - Vision Transformer with Progressive Sampling [73.60630716500154]
We propose an iterative and progressive sampling strategy to locate discriminative regions.
When trained from scratch on ImageNet, PS-ViT performs 3.8% higher than the vanilla ViT in terms of top-1 accuracy.
arXiv Detail & Related papers (2021-08-03T18:04:31Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - Spatiotemporal Transformer for Video-based Person Re-identification [102.58619642363958]
We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting.
We propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains.
The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks.
arXiv Detail & Related papers (2021-03-30T16:19:27Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Detecting Lane and Road Markings at A Distance with Perspective
Transformer Layers [5.033948921121557]
In existing approaches, the detection accuracy often degrades with the increasing distance.
This is due to the fact that distant lane and road markings occupy a small number of pixels in the image.
Inverse Perspective Mapping can be used to eliminate the perspective distortion, but the inherent can lead to artifacts.
arXiv Detail & Related papers (2020-03-19T03:22:52Z) - Traffic Signs Detection and Recognition System using Deep Learning [0.0]
This paper describes an approach for efficiently detecting and recognizing traffic signs in real-time.
We tackle the traffic sign detection problem using the state-of-the-art of multi-object detection systems.
The focus of this paper is going to be F-RCNN Inception v2 and Tiny YOLO v2 as they achieved the best results.
arXiv Detail & Related papers (2020-03-06T14:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.