Related papers: Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition

Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition

URL: http://arxiv.org/abs/2407.05814v1
Date: Mon, 8 Jul 2024 10:51:03 GMT
Title: Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition
Authors: Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama,
Abstract summary: We propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition. By using description texts, our method reduces the cross-domain differences between template and real traffic signs. Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels.
Score: 49.20086587208214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct a traffic sign detection network based on Vision Transformer Adapter and an extraction module to extract traffic signs from the original road images. To reduce the dependence on training data and improve the performance stability of cross-country TSR, we introduce a cross-domain few-shot in-context learning method based on the MLLM. To enhance MLLM's fine-grained recognition ability of traffic signs, the proposed method generates corresponding description texts using template traffic signs. These description texts contain key information about the shape, color, and composition of traffic signs, which can stimulate the ability of MLLM to perceive fine-grained traffic sign categories. By using the description texts, our method reduces the cross-domain differences between template and real traffic signs. Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels. We perform comprehensive evaluations on the German traffic sign recognition benchmark dataset, the Belgium traffic sign dataset, and two real-world datasets taken from Japan. The experimental results show that our method significantly enhances the TSR performance.

Related papers

Contrastive Learning-Driven Traffic Sign Perception: Multi-Modal Fusion of Text and Vision [2.0720154517628417]
We propose a novel framework combining open-vocabulary detection and cross-modal learning.<n>For traffic sign detection, our NanoVerse YOLO model integrates a vision-language path aggregation network (RepVL-PAN) and an SPD-Conv module.<n>For traffic sign classification, we designed a Traffic Sign Recognition Multimodal Contrastive Learning model (TSR-MCL)<n>On the TT100K dataset, our method achieves a state-of-the-art 78.4% mAP in the long-tail detection task for all-class recognition.
arXiv Detail & Related papers (2025-07-31T08:23:30Z)
TrafficLLM: Enhancing Large Language Models for Network Traffic Analysis with Generic Traffic Representation [14.470174593447702]
Large language models (LLMs) have shown promising performance in various domains. TrafficLLM introduces a dual-stage fine-tuning framework to learn generic traffic representation from raw traffic data. It achieves F1-scores of 0.9875 and 0.9483, with up to 80.12% and 33.92% better performance than existing detection and generation methods.
arXiv Detail & Related papers (2025-04-05T16:18:33Z)
Strada-LLM: Graph LLM for traffic prediction [62.2015839597764]
A considerable challenge in traffic prediction lies in handling the diverse data distributions caused by vastly different traffic conditions. We propose a graph-aware LLM for traffic prediction that considers proximal traffic information. We adopt a lightweight approach for efficient domain adaptation when facing new data distributions in few-shot fashion.
arXiv Detail & Related papers (2024-10-28T09:19:29Z)
TSCLIP: Robust CLIP Fine-Tuning for Worldwide Cross-Regional Traffic Sign Recognition [8.890563785528842]
Current methods for traffic sign recognition rely on traditional deep learning models. We propose TSCLIP, a robust fine-tuning approach with the contrastive language-image pre-training model. To the best knowledge of authors, TSCLIP is the first contrastive language-image model used for the worldwide cross-regional traffic sign recognition task.
arXiv Detail & Related papers (2024-09-23T14:51:26Z)
Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition [49.20086587208214]
We propose a new strategy called think twice before recognizing to improve fine-grained traffic sign recognition (TSR) Our strategy achieves effective fine-grained TSR by stimulating the multiple-thinking capability of large multimodal models (LMM)
arXiv Detail & Related papers (2024-09-03T02:08:47Z)
A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation [53.39174966020085]
Traffic signal control (TSC) is crucial for reducing traffic congestion that leads to smoother traffic flow, reduced idling time, and mitigated CO2 emissions. In this study, we explore the computer vision approach for TSC that modulates on-road traffic flows through visual observation. We introduce a holistic traffic simulation framework called TrafficDojo towards vision-based TSC and its benchmarking.
arXiv Detail & Related papers (2024-03-11T16:42:29Z)
BjTT: A Large-scale Multimodal Dataset for Traffic Prediction [49.93028461584377]
Traditional traffic prediction methods rely on historical traffic data to predict traffic trends. In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation. We propose ChatTraffic, the first diffusion model for text-to-traffic generation.
arXiv Detail & Related papers (2024-03-08T04:19:56Z)
Traffic Reconstruction and Analysis of Natural Driving Behaviors at Unsignalized Intersections [1.7273380623090846]
This research involved recording traffic at various unsignalized intersections in Memphis, TN, during different times of the day. After manually labeling video data to capture specific variables, we reconstructed traffic scenarios in the SUMO simulation environment. The output data from these simulations offered a comprehensive analysis, including time-space diagrams for vehicle movement, travel time frequency distributions, and speed-position plots to identify bottleneck points.
arXiv Detail & Related papers (2023-12-22T09:38:06Z)
Traffic Sign Recognition Using Local Vision Transformer [1.8416014644193066]
This paper proposes a new novel model that blends the advantages of both convolutional and transformer-based networks for traffic sign recognition. The proposed model includes convolutional blocks for capturing local correlations and transformer-based blocks for learning global dependencies. The experimental evaluations demonstrate that the hybrid network with the locality module outperforms pure transformer-based models and some of the best convolutional networks in accuracy.
arXiv Detail & Related papers (2023-11-11T19:42:41Z)
A Deeply Supervised Semantic Segmentation Method Based on GAN [9.441379867578332]
The proposed model integrates a generative adversarial network (GAN) framework into the traditional semantic segmentation model. The effectiveness of our approach is demonstrated by a significant boost in performance on the road crack dataset.
arXiv Detail & Related papers (2023-10-06T08:22:24Z)
Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation. We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.