Related papers: Pyramid Transformer for Traffic Sign Detection

Pyramid Transformer for Traffic Sign Detection

URL: http://arxiv.org/abs/2207.06067v1
Date: Wed, 13 Jul 2022 09:21:19 GMT
Title: Pyramid Transformer for Traffic Sign Detection
Authors: Omid Nejati Manzari, Amin Boudesh, Shahriar B. Shokouhi
Abstract summary: A novel Pyramid Transformer with locality mechanisms is proposed in this paper. Specifically, Pyramid Transformer has several spatial pyramid reduction layers to shrink and embed the input image into tokens with rich multi-scale context. The experiments are conducted on the German Traffic Sign Detection Benchmark (GTSDB)
Score: 1.933681537640272
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traffic sign detection is a vital task in the visual system of self-driving cars and the automated driving system. Recently, novel Transformer-based models have achieved encouraging results for various computer vision tasks. We still observed that vanilla ViT could not yield satisfactory results in traffic sign detection because the overall size of the datasets is very small and the class distribution of traffic signs is extremely unbalanced. To overcome this problem, a novel Pyramid Transformer with locality mechanisms is proposed in this paper. Specifically, Pyramid Transformer has several spatial pyramid reduction layers to shrink and embed the input image into tokens with rich multi-scale context by using atrous convolutions. Moreover, it inherits an intrinsic scale invariance inductive bias and is able to learn local feature representation for objects at various scales, thereby enhancing the network robustness against the size discrepancy of traffic signs. The experiments are conducted on the German Traffic Sign Detection Benchmark (GTSDB). The results demonstrate the superiority of the proposed model in the traffic sign detection tasks. More specifically, Pyramid Transformer achieves 75.6% mAP in GTSDB when applied to the Cascade RCNN as the backbone and surpassing most well-known and widely used SOTAs.

Related papers

Contrastive Learning-Driven Traffic Sign Perception: Multi-Modal Fusion of Text and Vision [2.0720154517628417]
We propose a novel framework combining open-vocabulary detection and cross-modal learning.<n>For traffic sign detection, our NanoVerse YOLO model integrates a vision-language path aggregation network (RepVL-PAN) and an SPD-Conv module.<n>For traffic sign classification, we designed a Traffic Sign Recognition Multimodal Contrastive Learning model (TSR-MCL)<n>On the TT100K dataset, our method achieves a state-of-the-art 78.4% mAP in the long-tail detection task for all-class recognition.
arXiv Detail & Related papers (2025-07-31T08:23:30Z)
A Dynamic Transformer Network for Vehicle Detection [57.4144097001218]
We present a dynamic Transformer network for vehicle detection (DTNet)<n>DTNet utilizes a dynamic convolution to guide a deep network to dynamically generate weights to enhance adaptability of an obtained detector.<n>To overcome the drawback of difference in an image account, a translation-variant convolution relies on spatial location information to refine obtained structural information for vehicle detection.
arXiv Detail & Related papers (2025-06-03T11:29:35Z)
Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition [49.20086587208214]
We propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition. By using description texts, our method reduces the cross-domain differences between template and real traffic signs. Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels.
arXiv Detail & Related papers (2024-07-08T10:51:03Z)
Traffic Sign Recognition Using Local Vision Transformer [1.8416014644193066]
This paper proposes a new novel model that blends the advantages of both convolutional and transformer-based networks for traffic sign recognition. The proposed model includes convolutional blocks for capturing local correlations and transformer-based blocks for learning global dependencies. The experimental evaluations demonstrate that the hybrid network with the locality module outperforms pure transformer-based models and some of the best convolutional networks in accuracy.
arXiv Detail & Related papers (2023-11-11T19:42:41Z)
Efficient Vision Transformer for Accurate Traffic Sign Detection [0.0]
This research paper addresses the challenges associated with traffic sign detection in self-driving vehicles and driver assistance systems. It introduces the application of the Transformer model, particularly the Vision Transformer variants, to tackle this task. To enhance the efficiency of the Transformer model, the research proposes a novel strategy that integrates a locality inductive bias and a transformer module.
arXiv Detail & Related papers (2023-11-02T17:44:32Z)
Distinguishing a planetary transit from false positives: a Transformer-based classification for planetary transit signals [2.2530415657791036]
We present a new architecture for the automatic classification of transit signals. Our proposed architecture is designed to capture the most significant features of a transit signal and stellar parameters. We show that our architecture achieves competitive results concerning the CNNs applied for recognizing exoplanetary transit signals.
arXiv Detail & Related papers (2023-04-27T15:43:25Z)
An Extendable, Efficient and Effective Transformer-based Object Detector [95.06044204961009]
We integrate Vision and Detection Transformers (ViDT) to construct an effective and efficient object detector. ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector. We extend it to ViDT+ to support joint-task learning for object detection and instance segmentation.
arXiv Detail & Related papers (2022-04-17T09:27:45Z)
Vision Transformer with Progressive Sampling [73.60630716500154]
We propose an iterative and progressive sampling strategy to locate discriminative regions. When trained from scratch on ImageNet, PS-ViT performs 3.8% higher than the vanilla ViT in terms of top-1 accuracy.
arXiv Detail & Related papers (2021-08-03T18:04:31Z)
Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention. Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z)
Spatiotemporal Transformer for Video-based Person Re-identification [102.58619642363958]
We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting. We propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains. The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks.
arXiv Detail & Related papers (2021-03-30T16:19:27Z)
Transformers Solve the Limited Receptive Field for Monocular Depth Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers. This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
Detecting Lane and Road Markings at A Distance with Perspective Transformer Layers [5.033948921121557]
In existing approaches, the detection accuracy often degrades with the increasing distance. This is due to the fact that distant lane and road markings occupy a small number of pixels in the image. Inverse Perspective Mapping can be used to eliminate the perspective distortion, but the inherent can lead to artifacts.
arXiv Detail & Related papers (2020-03-19T03:22:52Z)
Traffic Signs Detection and Recognition System using Deep Learning [0.0]
This paper describes an approach for efficiently detecting and recognizing traffic signs in real-time. We tackle the traffic sign detection problem using the state-of-the-art of multi-object detection systems. The focus of this paper is going to be F-RCNN Inception v2 and Tiny YOLO v2 as they achieved the best results.
arXiv Detail & Related papers (2020-03-06T14:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.