Traffic Sign Recognition Using Local Vision Transformer
- URL: http://arxiv.org/abs/2311.06651v1
- Date: Sat, 11 Nov 2023 19:42:41 GMT
- Title: Traffic Sign Recognition Using Local Vision Transformer
- Authors: Ali Farzipour, Omid Nejati Manzari, Shahriar B. Shokouhi
- Abstract summary: This paper proposes a new novel model that blends the advantages of both convolutional and transformer-based networks for traffic sign recognition.
The proposed model includes convolutional blocks for capturing local correlations and transformer-based blocks for learning global dependencies.
The experimental evaluations demonstrate that the hybrid network with the locality module outperforms pure transformer-based models and some of the best convolutional networks in accuracy.
- Score: 1.8416014644193066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recognition of traffic signs is a crucial aspect of self-driving cars and
driver assistance systems, and machine vision tasks such as traffic sign
recognition have gained significant attention. CNNs have been frequently used
in machine vision, but introducing vision transformers has provided an
alternative approach to global feature learning. This paper proposes a new
novel model that blends the advantages of both convolutional and
transformer-based networks for traffic sign recognition. The proposed model
includes convolutional blocks for capturing local correlations and
transformer-based blocks for learning global dependencies. Additionally, a
locality module is incorporated to enhance local perception. The performance of
the suggested model is evaluated on the Persian Traffic Sign Dataset and German
Traffic Sign Recognition Benchmark and compared with SOTA convolutional and
transformer-based models. The experimental evaluations demonstrate that the
hybrid network with the locality module outperforms pure transformer-based
models and some of the best convolutional networks in accuracy. Specifically,
our proposed final model reached 99.66% accuracy in the German traffic sign
recognition benchmark and 99.8% in the Persian traffic sign dataset, higher
than the best convolutional models. Moreover, it outperforms existing CNNs and
ViTs while maintaining fast inference speed. Consequently, the proposed model
proves to be significantly faster and more suitable for real-world
applications.
Related papers
- TSCLIP: Robust CLIP Fine-Tuning for Worldwide Cross-Regional Traffic Sign Recognition [8.890563785528842]
Current methods for traffic sign recognition rely on traditional deep learning models.
We propose TSCLIP, a robust fine-tuning approach with the contrastive language-image pre-training model.
To the best knowledge of authors, TSCLIP is the first contrastive language-image model used for the worldwide cross-regional traffic sign recognition task.
arXiv Detail & Related papers (2024-09-23T14:51:26Z) - Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition [49.20086587208214]
We propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition.
By using description texts, our method reduces the cross-domain differences between template and real traffic signs.
Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels.
arXiv Detail & Related papers (2024-07-08T10:51:03Z) - Revolutionizing Traffic Sign Recognition: Unveiling the Potential of Vision Transformers [0.0]
Traffic Sign Recognition (TSR) holds a vital role in advancing driver assistance systems and autonomous vehicles.
This study explores three variants of Vision Transformers (PVT, TNT, LNL) and six convolutional neural networks (AlexNet, ResNet, VGG16, MobileNet, EfficientNet, GoogleNet) as baseline models.
To address the shortcomings of traditional methods, a novel pyramid EATFormer backbone is proposed, amalgamating Evolutionary Algorithms (EAs) with the Transformer architecture.
arXiv Detail & Related papers (2024-04-29T19:18:52Z) - Traffic Pattern Classification in Smart Cities Using Deep Recurrent
Neural Network [0.519400993594577]
We propose a novel approach to traffic pattern classification based on deep recurrent neural networks.
The proposed model combines convolutional and recurrent layers to extract features from traffic pattern data.
The results show that the proposed model can accurately classify traffic patterns in smart cities with a precision of as high as 95%.
arXiv Detail & Related papers (2024-01-24T20:24:32Z) - Lightweight Vision Transformer with Bidirectional Interaction [63.65115590184169]
We propose a Fully Adaptive Self-Attention (FASA) mechanism for vision transformer to model the local and global information.
Based on FASA, we develop a family of lightweight vision backbones, Fully Adaptive Transformer (FAT) family.
arXiv Detail & Related papers (2023-06-01T06:56:41Z) - Pyramid Transformer for Traffic Sign Detection [1.933681537640272]
A novel Pyramid Transformer with locality mechanisms is proposed in this paper.
Specifically, Pyramid Transformer has several spatial pyramid reduction layers to shrink and embed the input image into tokens with rich multi-scale context.
The experiments are conducted on the German Traffic Sign Detection Benchmark (GTSDB)
arXiv Detail & Related papers (2022-07-13T09:21:19Z) - Dynamic Spatial Sparsification for Efficient Vision Transformers and
Convolutional Neural Networks [88.77951448313486]
We present a new approach for model acceleration by exploiting spatial sparsity in visual data.
We propose a dynamic token sparsification framework to prune redundant tokens.
We extend our method to hierarchical models including CNNs and hierarchical vision Transformers.
arXiv Detail & Related papers (2022-07-04T17:00:51Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Efficient Federated Learning with Spike Neural Networks for Traffic Sign
Recognition [70.306089187104]
We introduce powerful Spike Neural Networks (SNNs) into traffic sign recognition for energy-efficient and fast model training.
Numerical results indicate that the proposed federated SNN outperforms traditional federated convolutional neural networks in terms of accuracy, noise immunity, and energy efficiency as well.
arXiv Detail & Related papers (2022-05-28T03:11:48Z) - Robust Semi-supervised Federated Learning for Images Automatic
Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition.
There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules.
We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z) - Learning dynamic and hierarchical traffic spatiotemporal features with
Transformer [4.506591024152763]
This paper proposes a novel model, Traffic Transformer, for spatial-temporal graph modeling and long-term traffic forecasting.
Transformer is the most popular framework in Natural Language Processing (NLP)
analyzing the attention weight matrixes can find the influential part of road networks, allowing us to learn the traffic networks better.
arXiv Detail & Related papers (2021-04-12T02:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.