Traffic Sign Interpretation in Real Road Scene
- URL: http://arxiv.org/abs/2311.10793v2
- Date: Tue, 28 Nov 2023 10:23:46 GMT
- Title: Traffic Sign Interpretation in Real Road Scene
- Authors: Chuang Yang, Kai Zhuang, Mulin Chen, Haozhao Ma, Xu Han, Tao Han,
Changxing Guo, Han Han, Bingxuan Zhao, and Qi Wang
- Abstract summary: We propose a traffic sign interpretation (TSI) task, which aims to interpret global semantic interrelated traffic signs into a natural language.
The dataset consists of real road scene images, which are captured from the highway and the urban way in China from a driver's perspective.
Experiments on TSI-CN demonstrate that the TSI task is achievable and the TSI architecture can interpret traffic signs from scenes successfully.
- Score: 18.961971178824715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing traffic sign-related works are dedicated to detecting and
recognizing part of traffic signs individually, which fails to analyze the
global semantic logic among signs and may convey inaccurate traffic
instruction. Following the above issues, we propose a traffic sign
interpretation (TSI) task, which aims to interpret global semantic interrelated
traffic signs (e.g.,~driving instruction-related texts, symbols, and guide
panels) into a natural language for providing accurate instruction support to
autonomous or assistant driving. Meanwhile, we design a multi-task learning
architecture for TSI, which is responsible for detecting and recognizing
various traffic signs and interpreting them into a natural language like a
human. Furthermore, the absence of a public TSI available dataset prompts us to
build a traffic sign interpretation dataset, namely TSI-CN. The dataset
consists of real road scene images, which are captured from the highway and the
urban way in China from a driver's perspective. It contains rich location
labels of texts, symbols, and guide panels, and the corresponding natural
language description labels. Experiments on TSI-CN demonstrate that the TSI
task is achievable and the TSI architecture can interpret traffic signs from
scenes successfully even if there is a complex semantic logic among signs. The
TSI-CN dataset and the source code of the TSI architecture will be publicly
available after the revision process.
Related papers
- SignEye: Traffic Sign Interpretation from Vehicle First-Person View [43.49612694851131]
Traffic signs play a key role in assisting autonomous driving systems (ADS) by enabling the assessment of vehicle behavior in compliance with traffic regulations.
We introduce a new task: traffic sign interpretation from the vehicle's first-person view, referred to as TSI-FPV.
We also develop a traffic guidance assistant (TGA) scenario application to re-explore the role of traffic signs in ADS.
arXiv Detail & Related papers (2024-11-18T12:12:33Z) - TSCLIP: Robust CLIP Fine-Tuning for Worldwide Cross-Regional Traffic Sign Recognition [8.890563785528842]
Current methods for traffic sign recognition rely on traditional deep learning models.
We propose TSCLIP, a robust fine-tuning approach with the contrastive language-image pre-training model.
To the best knowledge of authors, TSCLIP is the first contrastive language-image model used for the worldwide cross-regional traffic sign recognition task.
arXiv Detail & Related papers (2024-09-23T14:51:26Z) - Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency [59.15544887307901]
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission.
Existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility.
We propose a novel trustworthy ISC framework that employs Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks.
arXiv Detail & Related papers (2024-08-07T14:32:36Z) - Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition [49.20086587208214]
We propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition.
By using description texts, our method reduces the cross-domain differences between template and real traffic signs.
Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels.
arXiv Detail & Related papers (2024-07-08T10:51:03Z) - SignBLEU: Automatic Evaluation of Multi-channel Sign Language Translation [3.9711029428461653]
We introduce a new task named multi-channel sign language translation (MCSLT)
We present a novel metric, SignBLEU, designed to capture multiple signal channels.
We found that SignBLEU consistently correlates better with human judgment than competing metrics.
arXiv Detail & Related papers (2024-06-10T05:01:26Z) - MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition [94.56755080185732]
We propose a Motion-Aware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information.
Our framework can simultaneously learn local motion cues and global semantic features for comprehensive sign language representation.
arXiv Detail & Related papers (2024-05-31T08:06:05Z) - Traffic Scenario Logic: A Spatial-Temporal Logic for Modeling and Reasoning of Urban Traffic Scenarios [6.671075180562082]
Traffic Scenario Logic (TSL) is a spatial-temporal logic designed for modeling and reasoning of urban pedestrian-free traffic scenarios.
We implement TSL using Telingo, i.e., a solver for temporal programs based on the Answer Set Programming, and tested it on different urban road layouts.
arXiv Detail & Related papers (2024-05-22T15:06:50Z) - TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for
Sign Language Translation [101.6042317204022]
Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences.
Existing SLT models usually represent sign visual features in a frame-wise manner.
We develop a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet.
arXiv Detail & Related papers (2020-10-12T05:58:09Z) - Sign Language Transformers: Joint End-to-end Sign Language Recognition
and Translation [59.38247587308604]
We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation.
We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T dataset.
Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models.
arXiv Detail & Related papers (2020-03-30T21:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.