Video-based Traffic Light Recognition by Rockchip RV1126 for Autonomous Driving
- URL: http://arxiv.org/abs/2503.23965v1
- Date: Mon, 31 Mar 2025 11:27:48 GMT
- Title: Video-based Traffic Light Recognition by Rockchip RV1126 for Autonomous Driving
- Authors: Miao Fan, Xuxu Kong, Shengtong Xu, Haoyi Xiong, Xiangzeng Liu,
- Abstract summary: Real-time traffic light recognition is fundamental for autonomous driving safety and navigation in urban environments.<n>We present textitViTLR, a novel video-based end-to-end neural network that processes multiple consecutive frames to achieve robust traffic light detection and state classification.<n>We have successfully integrated textitViTLR into an ego-lane traffic light recognition system using HD maps for autonomous driving applications.
- Score: 19.468567166834585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-time traffic light recognition is fundamental for autonomous driving safety and navigation in urban environments. While existing approaches rely on single-frame analysis from onboard cameras, they struggle with complex scenarios involving occlusions and adverse lighting conditions. We present \textit{ViTLR}, a novel video-based end-to-end neural network that processes multiple consecutive frames to achieve robust traffic light detection and state classification. The architecture leverages a transformer-like design with convolutional self-attention modules, which is optimized specifically for deployment on the Rockchip RV1126 embedded platform. Extensive evaluations on two real-world datasets demonstrate that \textit{ViTLR} achieves state-of-the-art performance while maintaining real-time processing capabilities (>25 FPS) on RV1126's NPU. The system shows superior robustness across temporal stability, varying target distances, and challenging environmental conditions compared to existing single-frame approaches. We have successfully integrated \textit{ViTLR} into an ego-lane traffic light recognition system using HD maps for autonomous driving applications. The complete implementation, including source code and datasets, is made publicly available to facilitate further research in this domain.
Related papers
- The ATLAS of Traffic Lights: A Reliable Perception Framework for Autonomous Driving [9.932968493913357]
We propose a modularized perception framework that integrates state-of-the-art detection models with a novel real-time association and decision framework.
We introduce the ATLAS dataset, which provides comprehensive annotations of traffic light states and pictograms.
We train and evaluate several state-of-the-art traffic light detection architectures on ATLAS, demonstrating significant performance improvements in both accuracy and robustness.
arXiv Detail & Related papers (2025-04-28T12:15:42Z) - Real-Time Navigation for Autonomous Aerial Vehicles Using Video [11.414350041043326]
We introduce a novel Markov Decision Process(MDP) framework to reduce the workload of Computer Vision(CV) algorithms.
We apply our proposed framework to both feature-based and neural-network-based object-detection tasks.
These holistic tests show significant benefits in energy consumption and speed with only a limited loss in accuracy.
arXiv Detail & Related papers (2025-04-01T01:14:42Z) - Multi-modal Multi-platform Person Re-Identification: Benchmark and Method [58.59888754340054]
MP-ReID is a novel dataset designed specifically for multi-modality and multi-platform ReID.<n>This benchmark compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging.<n>We introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios.
arXiv Detail & Related papers (2025-03-21T12:27:49Z) - Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework [62.47416496137193]
We propose a surveillance video assisted federated digital twin (SV-FDT) framework to empower ITSs with pedestrians and vehicles in-the-loop.<n>The architecture consists of three layers: (i) the end layer, which collects traffic surveillance videos from multiple sources; (ii) the edge layer, responsible for semantic segmentation-based visual understanding, twin agent-based interaction modeling, and local digital twin system (LDTS) creation in local regions; and (iii) the cloud layer, which integrates LDTSs across different regions to construct a global DT model in realtime.
arXiv Detail & Related papers (2025-03-06T07:36:06Z) - Towards Real-Time 2D Mapping: Harnessing Drones, AI, and Computer Vision for Advanced Insights [0.0]
This paper presents an advanced mapping system that combines drone imagery with machine learning and computer vision to overcome challenges in speed, accuracy, and adaptability across diverse terrains.<n>The system produces seamless, high-resolution maps with minimal latency, offering strategic advantages in defense operations.
arXiv Detail & Related papers (2024-12-28T16:47:18Z) - Homography Guided Temporal Fusion for Road Line and Marking Segmentation [73.47092021519245]
Road lines and markings are frequently occluded in the presence of moving vehicles, shadow, and glare.
We propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues.
We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy.
arXiv Detail & Related papers (2024-04-11T10:26:40Z) - Region of Interest (ROI) based adaptive cross-layer system for real-time
video streaming over Vehicular Ad-hoc NETworks (VANETs) [2.2124180701409233]
We propose an algorithm that improves end-to-end video transmission quality in a vehicular context.
The proposed low complexity solution gives highest priority to the scene regions of interest.
Realistic VANET simulation results demonstrate that for HEVC compressed video communications, the proposed system offers PSNR gains up to 11dB on the ROI part.
arXiv Detail & Related papers (2023-11-05T13:56:04Z) - FARSEC: A Reproducible Framework for Automatic Real-Time Vehicle Speed
Estimation Using Traffic Cameras [14.339217121537537]
Transportation-dependent systems, such as for navigation and logistics, have great potential to benefit from reliable speed estimation.
We provide a novel framework for automatic real-time vehicle speed calculation, which copes with more diverse data from publicly available traffic cameras.
Our framework is capable of handling realistic conditions such as camera movements and different video stream inputs automatically.
arXiv Detail & Related papers (2023-09-25T19:02:40Z) - aiMotive Dataset: A Multimodal Dataset for Robust Autonomous Driving
with Long-Range Perception [0.0]
This dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view.
The collected data was captured in highway, urban, and suburban areas during daytime, night, and rain.
We trained unimodal and multimodal baseline models for 3D object detection.
arXiv Detail & Related papers (2022-11-17T10:19:59Z) - Deep Learning Computer Vision Algorithms for Real-time UAVs On-board
Camera Image Processing [77.34726150561087]
This paper describes how advanced deep learning based computer vision algorithms are applied to enable real-time on-board sensor processing for small UAVs.
All algorithms have been developed using state-of-the-art image processing methods based on deep neural networks.
arXiv Detail & Related papers (2022-11-02T11:10:42Z) - Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge
TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures.
We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z) - VISTA 2.0: An Open, Data-driven Simulator for Multimodal Sensing and
Policy Learning for Autonomous Vehicles [131.2240621036954]
We present VISTA, an open source, data-driven simulator that integrates multiple types of sensors for autonomous vehicles.
Using high fidelity, real-world datasets, VISTA represents and simulates RGB cameras, 3D LiDAR, and event-based cameras.
We demonstrate the ability to train and test perception-to-control policies across each of the sensor types and showcase the power of this approach via deployment on a full scale autonomous vehicle.
arXiv Detail & Related papers (2021-11-23T18:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.