Related papers: DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving

DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving

URL: http://arxiv.org/abs/2506.17590v1
Date: Sat, 21 Jun 2025 05:01:42 GMT
Title: DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving
Authors: Mihir Godbole, Xiangbo Gao, Zhengzhong Tu,
Abstract summary: No existing benchmark evaluates multi-class intent prediction in safety-critical situations.<n>We introduce DRAMA-X, a fine-grained benchmark constructed from the DRAMA dataset.<n>We propose SGG-Intent, a lightweight, training-free framework that mirrors the ego vehicle's reasoning pipeline.
Score: 5.362063089413001
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding the short-term motion of vulnerable road users (VRUs) like pedestrians and cyclists is critical for safe autonomous driving, especially in urban scenarios with ambiguous or high-risk behaviors. While vision-language models (VLMs) have enabled open-vocabulary perception, their utility for fine-grained intent reasoning remains underexplored. Notably, no existing benchmark evaluates multi-class intent prediction in safety-critical situations, To address this gap, we introduce DRAMA-X, a fine-grained benchmark constructed from the DRAMA dataset via an automated annotation pipeline. DRAMA-X contains 5,686 accident-prone frames labeled with object bounding boxes, a nine-class directional intent taxonomy, binary risk scores, expert-generated action suggestions for the ego vehicle, and descriptive motion summaries. These annotations enable a structured evaluation of four interrelated tasks central to autonomous decision-making: object detection, intent prediction, risk assessment, and action suggestion. As a reference baseline, we propose SGG-Intent, a lightweight, training-free framework that mirrors the ego vehicle's reasoning pipeline. It sequentially generates a scene graph from visual input using VLM-backed detectors, infers intent, assesses risk, and recommends an action using a compositional reasoning stage powered by a large language model. We evaluate a range of recent VLMs, comparing performance across all four DRAMA-X tasks. Our experiments demonstrate that scene-graph-based reasoning enhances intent prediction and risk assessment, especially when contextual cues are explicitly modeled.

Related papers

Why Braking? Scenario Extraction and Reasoning Utilizing LLM [13.88343221678386]
We propose a novel framework that leverages Large Language Model (LLM) for scenario understanding and reasoning.<n>Our method bridges the gap between low-level numerical signals and natural language descriptions, enabling LLM to interpret and classify driving scenarios.
arXiv Detail & Related papers (2025-07-17T08:33:56Z)
VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for Accident Scene Understanding [0.0]
multimodal large language models (MLLMs) have shown promise in enhancing scene understanding and decision making in autonomous vehicles.<n>We present VRU-Accident, a vision-language benchmark designed to evaluate MLLMs in high-risk traffic scenarios involving VRUs.<n>Unlike prior works, our benchmark focuses explicitly on VRU-vehicle accidents, providing rich, fine-grained annotations that capture both spatial-temporal dynamics and causal semantics of accidents.
arXiv Detail & Related papers (2025-07-13T22:14:35Z)
PADriver: Towards Personalized Autonomous Driving [27.96579880234604]
We propose PADriver, a novel closed-loop framework for personalized autonomous driving (PAD)<n>Built upon Multi-modal Large Language Model (MLLM), PADriver takes streaming frames and personalized textual prompts as inputs.<n>We construct a benchmark named PAD-Highway based on Highway-Env simulator to comprehensively evaluate the decision performance under traffic rules.
arXiv Detail & Related papers (2025-05-08T13:36:07Z)
Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving [65.61999354218628]
We take the first step toward designing black-box adversarial attacks specifically targeting vision-language models (VLMs) in autonomous driving systems.<n>We propose Cascading Adversarial Disruption (CAD), which targets low-level reasoning breakdown by generating and injecting semantics.<n>We present Risky Scene Induction, which addresses dynamic adaptation by leveraging a surrogate VLM to understand and construct high-level risky scenarios.
arXiv Detail & Related papers (2025-01-23T11:10:02Z)
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives [56.528835143531694]
We introduce DriveBench, a benchmark dataset designed to evaluate Vision-Language Models (VLMs)<n>Our findings reveal that VLMs often generate plausible responses derived from general knowledge or textual cues rather than true visual grounding.<n>We propose refined evaluation metrics that prioritize robust visual grounding and multi-modal understanding.
arXiv Detail & Related papers (2025-01-07T18:59:55Z)
Zero-shot Hazard Identification in Autonomous Driving: A Case Study on the COOOL Benchmark [0.0]
This paper presents our submission to the COOOL competition, a novel benchmark for detecting and classifying out-of-label hazards in autonomous driving.<n>Our approach integrates diverse methods across three core tasks: (i) driver reaction detection, (ii) hazard object identification, and (iii) hazard captioning.<n>The proposed pipeline outperformed the baseline methods by a large margin, reducing the relative error by 33%, and scored 2nd on the final leaderboard consisting of 32 teams.
arXiv Detail & Related papers (2024-12-27T22:43:46Z)
A Counterfactual Safety Margin Perspective on the Scoring of Autonomous Vehicles' Riskiness [52.27309191283943]
This paper presents a data-driven framework for assessing the risk of different AVs' behaviors. We propose the notion of counterfactual safety margin, which represents the minimum deviation from nominal behavior that could cause a collision.
arXiv Detail & Related papers (2023-08-02T09:48:08Z)
Realistic Safety-critical Scenarios Search for Autonomous Driving System via Behavior Tree [8.286351881735191]
We propose the Matrix-Fuzzer, a behavior tree-based testing framework, to automatically generate realistic safety-critical test scenarios. Our approach is able to find the most types of safety-critical scenarios, but only generating around 30% of the total scenarios compared with the baseline algorithm.
arXiv Detail & Related papers (2023-05-11T06:53:03Z)
DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z)
Intersection Warning System for Occlusion Risks using Relational Local Dynamic Maps [0.0]
This work addresses the task of risk evaluation in traffic scenarios with limited observability due to restricted sensorial coverage. To identify the area of sight, we employ ray casting on a local dynamic map providing geometrical information and road infrastructure. Resulting risk indicators are utilized to evaluate the driver's current behavior, to warn the driver in critical situations, to give suggestions on how to act safely or to plan safe trajectories.
arXiv Detail & Related papers (2023-03-13T16:01:55Z)
DRAMA: Joint Risk Localization and Captioning in Driving [23.091433343825727]
This paper proposes a new research direction of joint risk localization in driving scenes and its risk explanation as a natural language description. Due to the lack of standard benchmarks, we collected a large-scale dataset, DRAMA (Driving Risk Assessment Mechanism with A captioning module) Our dataset accommodates video- and object-level questions on driving risks with associated important objects to achieve the goal of visual captioning.
arXiv Detail & Related papers (2022-09-22T03:53:56Z)
Learning Uncertainty For Safety-Oriented Semantic Segmentation In Autonomous Driving [77.39239190539871]
We show how uncertainty estimation can be leveraged to enable safety critical image segmentation in autonomous driving. We introduce a new uncertainty measure based on disagreeing predictions as measured by a dissimilarity function. We show experimentally that our proposed approach is much less computationally intensive at inference time than competing methods.
arXiv Detail & Related papers (2021-05-28T09:23:05Z)
Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts? [104.04999499189402]
Out-of-training-distribution (OOD) scenarios are a common challenge of learning agents at deployment. We propose an uncertainty-aware planning method, called emphrobust imitative planning (RIP) Our method can detect and recover from some distribution shifts, reducing the overconfident and catastrophic extrapolations in OOD scenes. We introduce an autonomous car novel-scene benchmark, textttCARNOVEL, to evaluate the robustness of driving agents to a suite of tasks with distribution shifts.
arXiv Detail & Related papers (2020-06-26T11:07:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.