Human-in-the-loop Reasoning For Traffic Sign Detection: Collaborative Approach Yolo With Video-llava
- URL: http://arxiv.org/abs/2410.05096v1
- Date: Mon, 7 Oct 2024 14:50:56 GMT
- Title: Human-in-the-loop Reasoning For Traffic Sign Detection: Collaborative Approach Yolo With Video-llava
- Authors: Mehdi Azarafza, Fatima Idrees, Ali Ehteshami Bejnordi, Charles Steinmetz, Stefan Henkler, Achim Rettberg,
- Abstract summary: This paper proposes a method that combines video analysis and reasoning, prompting with a human-in-the-loop guide large vision model to improve YOLOs accuracy.
It is hypothesized that the guided prompting and reasoning abilities of Video-LLava can enhance YOLOs traffic sign detection capabilities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Traffic Sign Recognition (TSR) detection is a crucial component of autonomous vehicles. While You Only Look Once (YOLO) is a popular real-time object detection algorithm, factors like training data quality and adverse weather conditions (e.g., heavy rain) can lead to detection failures. These failures can be particularly dangerous when visual similarities between objects exist, such as mistaking a 30 km/h sign for a higher speed limit sign. This paper proposes a method that combines video analysis and reasoning, prompting with a human-in-the-loop guide large vision model to improve YOLOs accuracy in detecting road speed limit signs, especially in semi-real-world conditions. It is hypothesized that the guided prompting and reasoning abilities of Video-LLava can enhance YOLOs traffic sign detection capabilities. This hypothesis is supported by an evaluation based on human-annotated accuracy metrics within a dataset of recorded videos from the CARLA car simulator. The results demonstrate that a collaborative approach combining YOLO with Video-LLava and reasoning can effectively address challenging situations such as heavy rain and overcast conditions that hinder YOLOs detection capabilities.
Related papers
- YOLO-PPA based Efficient Traffic Sign Detection for Cruise Control in Autonomous Driving [10.103731437332693]
It is very important to detect traffic signs efficiently and accurately in autonomous driving systems.
Existing object detection algorithms can hardly detect these small scaled signs.
A YOLO PPA based traffic sign detection algorithm is proposed in this paper.
arXiv Detail & Related papers (2024-09-05T07:49:21Z) - Real-Time Traffic Sign Detection: A Case Study in a Santa Clara Suburban
Neighborhood [2.4087090457198435]
The project's primary objectives are to train the YOLOv5 model on a diverse dataset of traffic sign images and deploy the model on a suitable hardware platform.
The performance of the deployed system will be evaluated based on its accuracy in detecting traffic signs, real-time processing speed, and overall reliability.
arXiv Detail & Related papers (2023-10-14T17:52:28Z) - Unsupervised Driving Event Discovery Based on Vehicle CAN-data [62.997667081978825]
This work presents a simultaneous clustering and segmentation approach for vehicle CAN-data that identifies common driving events in an unsupervised manner.
We evaluate our approach with a dataset of real Tesla Model 3 vehicle CAN-data and a two-hour driving session that we annotated with different driving events.
arXiv Detail & Related papers (2023-01-12T13:10:47Z) - A Real-Time Wrong-Way Vehicle Detection Based on YOLO and Centroid
Tracking [0.0]
Wrong-way driving is one of the main causes of road accidents and traffic jam all over the world.
In this paper, we propose an automatic wrong-way vehicle detection system from on-road surveillance camera footage.
arXiv Detail & Related papers (2022-10-19T00:53:28Z) - Threat Detection In Self-Driving Vehicles Using Computer Vision [0.0]
We propose a threat detection mechanism for autonomous self-driving cars using dashcam videos.
There are four major components, namely, YOLO to identify the objects, advanced lane detection algorithm, multi regression model to measure the distance of the object from the camera.
The final accuracy of our proposed Threat Detection Model (TDM) is 82.65%.
arXiv Detail & Related papers (2022-09-06T12:01:07Z) - Driving Anomaly Detection Using Conditional Generative Adversarial
Network [26.45460503638333]
This study proposes an unsupervised method to quantify driving anomalies using a conditional generative adversarial network (GAN)
The approach predicts upcoming driving scenarios by conditioning the models on the previously observed signals.
The results are validated with perceptual evaluations, where annotators are asked to assess the risk and familiarity of the videos detected with high anomaly scores.
arXiv Detail & Related papers (2022-03-15T22:10:01Z) - Real Time Monocular Vehicle Velocity Estimation using Synthetic Data [78.85123603488664]
We look at the problem of estimating the velocity of road vehicles from a camera mounted on a moving car.
We propose a two-step approach where first an off-the-shelf tracker is used to extract vehicle bounding boxes and then a small neural network is used to regress the vehicle velocity.
arXiv Detail & Related papers (2021-09-16T13:10:27Z) - Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of
Adverse Weather Conditions for 3D Object Detection [60.89616629421904]
Lidar-based object detectors are critical parts of the 3D perception pipeline in autonomous navigation systems such as self-driving cars.
They are sensitive to adverse weather conditions such as rain, snow and fog due to reduced signal-to-noise ratio (SNR) and signal-to-background ratio (SBR)
arXiv Detail & Related papers (2021-07-14T21:10:47Z) - Driving-Signal Aware Full-Body Avatars [49.89791440532946]
We present a learning-based method for building driving-signal aware full-body avatars.
Our model is a conditional variational autoencoder that can be animated with incomplete driving signals.
We demonstrate the efficacy of our approach on the challenging problem of full-body animation for virtual telepresence.
arXiv Detail & Related papers (2021-05-21T16:22:38Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - Deep traffic light detection by overlaying synthetic context on
arbitrary natural images [49.592798832978296]
We propose a method to generate artificial traffic-related training data for deep traffic light detectors.
This data is generated using basic non-realistic computer graphics to blend fake traffic scenes on top of arbitrary image backgrounds.
It also tackles the intrinsic data imbalance problem in traffic light datasets, caused mainly by the low amount of samples of the yellow state.
arXiv Detail & Related papers (2020-11-07T19:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.