V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard?
- URL: http://arxiv.org/abs/2408.10872v4
- Date: Tue, 22 Jul 2025 10:18:50 GMT
- Title: V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard?
- Authors: Natchapon Jongwiriyanurak, Zichao Zeng, June Moh Goo, Xinglei Wang, Ilya Ilyankou, Kerkritt Sriroongvikrai, Nicola Christie, Meihui Wang, Huanfa Chen, James Haworth,
- Abstract summary: Road safety assessments are critical yet costly, especially in Low- and Middle-Income Countries (LMICs)<n>Traditional methods require expert annotation and training data, while supervised learning-based approaches struggle to generalise across regions.<n>We introduce textitV-RoAst, a zero-shot Visual Question Answering framework using Vision-Language Models (VLMs) to classify road safety attributes.
- Score: 1.3201295431850615
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Road safety assessments are critical yet costly, especially in Low- and Middle-Income Countries (LMICs), where most roads remain unrated. Traditional methods require expert annotation and training data, while supervised learning-based approaches struggle to generalise across regions. In this paper, we introduce \textit{V-RoAst}, a zero-shot Visual Question Answering (VQA) framework using Vision-Language Models (VLMs) to classify road safety attributes defined by the iRAP standard. We introduce the first open-source dataset from ThaiRAP, consisting of over 2,000 curated street-level images from Thailand annotated for this task. We evaluate Gemini-1.5-flash and GPT-4o-mini on this dataset and benchmark their performance against VGGNet and ResNet baselines. While VLMs underperform on spatial awareness, they generalise well to unseen classes and offer flexible prompt-based reasoning without retraining. Our results show that VLMs can serve as automatic road assessment tools when integrated with complementary data. This work is the first to explore VLMs for zero-shot infrastructure risk assessment and opens new directions for automatic, low-cost road safety mapping. Code and dataset: https://github.com/PongNJ/V-RoAst.
Related papers
- SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation [27.135615596331263]
Vision-language models (VLMs) can be utilized to enhance the safety for the autonomous driving system.<n>Existing research has largely overlooked the evaluation of these models in traffic safety-critical driving scenarios.<n>We propose a new baseline based on VLM with knowledge graph-based retrieval-augmented generation for visual question answering.
arXiv Detail & Related papers (2025-07-29T08:40:17Z) - How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance.<n>We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z) - HazardNet: A Small-Scale Vision Language Model for Real-Time Traffic Safety Detection at Edge Devices [5.233512464561313]
This paper introduces HazardNet, a small-scale Vision Language Model designed to enhance traffic safety.
We built HazardNet by fine-tuning the pre-trained Qwen2-VL-2B model, chosen for its superior performance among open-source alternatives.
We present HazardQA, a novel Vision Question Answering dataset constructed specifically for training HazardNet on real-world scenarios.
arXiv Detail & Related papers (2025-02-27T22:21:45Z) - Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives [56.528835143531694]
We introduce DriveBench, a benchmark dataset designed to evaluate Vision-Language Models (VLMs)<n>Our findings reveal that VLMs often generate plausible responses derived from general knowledge or textual cues rather than true visual grounding.<n>We propose refined evaluation metrics that prioritize robust visual grounding and multi-modal understanding.
arXiv Detail & Related papers (2025-01-07T18:59:55Z) - OpenLKA: an open dataset of lane keeping assist from market autonomous vehicles [23.083443555590065]
Lane Keeping Assist (LKA) has become a standard feature in recent car models.<n>LKA system's operational characteristics and safety performance remain underexplored.<n>We extensively tested mainstream LKA systems from leading U.S. automakers in Tampa, Florida.
arXiv Detail & Related papers (2025-01-06T04:46:10Z) - MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services [94.61039892220037]
We present a novel immersion-aware model trading framework that incentivizes metaverse users (MUs) to contribute learning models for augmented reality (AR) services in the vehicular metaverse.
Considering dynamic network conditions and privacy concerns, we formulate the reward decisions of MSPs as a multi-agent Markov decision process.
Experimental results demonstrate that the proposed framework can effectively provide higher-value models for object detection and classification in AR services on real AR-related vehicle datasets.
arXiv Detail & Related papers (2024-10-25T16:20:46Z) - ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding [5.914751204116458]
We introduce ScVLM, a novel hybrid methodology that integrates supervised and contrastive learning techniques to classify severity and types of SCEs.<n>The proposed approach is trained on and evaluated by more than 8,600 SCEs from the Second Strategic Highway Research Program Naturalistic Driving Study dataset.
arXiv Detail & Related papers (2024-10-01T18:10:23Z) - An Explainable Machine Learning Approach to Traffic Accident Fatality Prediction [0.02730969268472861]
Road traffic accidents pose a significant public health threat worldwide.
This study presents a machine learning-based approach for classifying fatal and non-fatal road accident outcomes.
arXiv Detail & Related papers (2024-09-18T12:41:56Z) - Computer vision-based model for detecting turning lane features on Florida's public roadways [2.5849315636929475]
This study detects roadway features on Florida's public roads from high-resolution aerial images using AI.
The extracted roadway geometry data can be integrated with crash and traffic data to provide valuable insights to policymakers and roadway users.
arXiv Detail & Related papers (2024-06-13T05:28:53Z) - A Bi-Objective Approach to Last-Mile Delivery Routing Considering Driver Preferences [42.16665455951525]
The Multi-Objective Vehicle Routing Problem (MOVRP) is a complex optimization problem in the transportation and logistics industry.
This paper proposes a novel approach to the MOVRP that aims to create routes that consider drivers' and operators' decisions and preferences.
We evaluate two approaches to address this objective: visually attractive route planning and data mining of historical driver behavior to plan similar routes.
arXiv Detail & Related papers (2024-05-25T04:25:00Z) - DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems.<n>We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z) - Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction [69.29802752614677]
RouteFormer is a novel ego-trajectory prediction network combining GPS data, environmental context, and the driver's field-of-view.
To tackle data scarcity and enhance diversity, we introduce GEM, a dataset of urban driving scenarios enriched with synchronized driver field-of-view and gaze data.
arXiv Detail & Related papers (2023-12-13T23:06:30Z) - How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for
Vision LLMs [55.91371032213854]
This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning.
We introduce a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness.
arXiv Detail & Related papers (2023-11-27T18:59:42Z) - RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and
Comfortable Autonomous Driving [67.09546127265034]
Road surface reconstruction helps to enhance the analysis and prediction of vehicle responses for motion planning and control systems.
We introduce the Road Surface Reconstruction dataset, a real-world, high-resolution, and high-precision dataset collected with a specialized platform in diverse driving conditions.
It covers common road types containing approximately 16,000 pairs of stereo images, original point clouds, and ground-truth depth/disparity maps.
arXiv Detail & Related papers (2023-10-03T17:59:32Z) - Autonomous and Human-Driven Vehicles Interacting in a Roundabout: A
Quantitative and Qualitative Evaluation [34.67306374722473]
We learn a policy to minimize traffic jams and to minimize pollution in a roundabout in Milan, Italy.
We qualitatively evaluate the learned policy using a cutting-edge cockpit to assess its performance in near-real-world conditions.
Our findings show that human-driven vehicles benefit from optimizing AVs dynamics.
arXiv Detail & Related papers (2023-09-15T09:02:16Z) - Continual Cross-Dataset Adaptation in Road Surface Classification [4.470499157873342]
Deep learning models for road surface classification suffer from poor generalization when tested on unseen datasets.
We propose to employ continual learning finetuning methods designed to retain past knowledge while adapting to new data, thus effectively avoiding forgetting.
arXiv Detail & Related papers (2023-09-05T13:18:52Z) - A Counterfactual Safety Margin Perspective on the Scoring of Autonomous
Vehicles' Riskiness [52.27309191283943]
This paper presents a data-driven framework for assessing the risk of different AVs' behaviors.
We propose the notion of counterfactual safety margin, which represents the minimum deviation from nominal behavior that could cause a collision.
arXiv Detail & Related papers (2023-08-02T09:48:08Z) - Dynamic loss balancing and sequential enhancement for road-safety
assessment and traffic scene classification [0.0]
Road-safety inspection is an indispensable instrument for reducing road-accident fatalities contributed to road infrastructure.
Recent work formalizes road-safety assessment in terms of carefully selected risk factors that are also known as road-safety attributes.
We propose to reduce dependency on tedious human labor by automating recognition with a two-stage neural architecture.
arXiv Detail & Related papers (2022-11-08T11:10:07Z) - A Survey on Temporal Sentence Grounding in Videos [69.13365006222251]
Temporal sentence grounding in videos(TSGV) aims to localize one target segment from an untrimmed video with respect to a given sentence query.
To the best of our knowledge, this is the first systematic survey on temporal sentence grounding.
arXiv Detail & Related papers (2021-09-16T15:01:46Z) - End-to-end Interpretable Neural Motion Planner [78.69295676456085]
We propose a neural motion planner (NMP) for learning to drive autonomously in complex urban scenarios.
We design a holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations.
We demonstrate the effectiveness of our approach in real-world driving data captured in several cities in North America.
arXiv Detail & Related papers (2021-01-17T14:16:12Z) - Out-of-Distribution Detection for Automotive Perception [58.34808836642603]
Neural networks (NNs) are widely used for object classification in autonomous driving.
NNs can fail on input data not well represented by the training dataset, known as out-of-distribution (OOD) data.
This paper presents a method for determining whether inputs are OOD, which does not require OOD data during training and does not increase the computational cost of inference.
arXiv Detail & Related papers (2020-11-03T01:46:35Z) - Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep
Reinforcement Learning Approach [88.45509934702913]
We design a navigation policy for multiple unmanned aerial vehicles (UAVs) where mobile base stations (BSs) are deployed.
We incorporate different contextual information such as energy and age of information (AoI) constraints to ensure the data freshness at the ground BS.
By applying the proposed trained model, an effective real-time trajectory policy for the UAV-BSs captures the observable network states over time.
arXiv Detail & Related papers (2020-02-21T07:29:15Z) - Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling [65.99956848461915]
Vision-and-Language Navigation (VLN) is a task where agents must decide how to move through a 3D environment to reach a goal.
One of the problems of the VLN task is data scarcity since it is difficult to collect enough navigation paths with human-annotated instructions for interactive environments.
We propose an adversarial-driven counterfactual reasoning model that can consider effective conditions instead of low-quality augmented data.
arXiv Detail & Related papers (2019-11-17T18:02:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.