NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving
- URL: http://arxiv.org/abs/2509.25944v1
- Date: Tue, 30 Sep 2025 08:37:31 GMT
- Title: NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving
- Authors: Yuan Gao, Mattia Piccinini, Roberto Brusnicki, Yuchen Zhang, Johannes Betz,
- Abstract summary: Understanding risk in autonomous driving requires high-level reasoning about agent behavior and context.<n>Current Vision Language Models (Ms)-based methods primarily ground agents in static images.<n>We propose NuRisk as a critical benchmark for advancing explicit-temporal reasoning in autonomous driving.
- Score: 10.340969230365138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding risk in autonomous driving requires not only perception and prediction, but also high-level reasoning about agent behavior and context. Current Vision Language Models (VLMs)-based methods primarily ground agents in static images and provide qualitative judgments, lacking the spatio-temporal reasoning needed to capture how risks evolve over time. To address this gap, we propose NuRisk, a comprehensive Visual Question Answering (VQA) dataset comprising 2,900 scenarios and 1.1 million agent-level samples, built on real-world data from nuScenes and Waymo, supplemented with safety-critical scenarios from the CommonRoad simulator. The dataset provides Bird-Eye-View (BEV) based sequential images with quantitative, agent-level risk annotations, enabling spatio-temporal reasoning. We benchmark well-known VLMs across different prompting techniques and find that they fail to perform explicit spatio-temporal reasoning, resulting in a peak accuracy of 33% at high latency. To address these shortcomings, our fine-tuned 7B VLM agent improves accuracy to 41% and reduces latency by 75%, demonstrating explicit spatio-temporal reasoning capabilities that proprietary models lacked. While this represents a significant step forward, the modest accuracy underscores the profound challenge of the task, establishing NuRisk as a critical benchmark for advancing spatio-temporal reasoning in autonomous driving.
Related papers
- Seeing Farther and Smarter: Value-Guided Multi-Path Reflection for VLM Policy Optimization [41.15414881730464]
Vision-Language Models (VLMs) offer a general perceive-reason-act framework for this goal.<n>Previous approaches rely on inefficient and often inaccurate implicit learning of state-values from noisy foresight predictions.<n>We propose a novel test-time computation framework that decouples state evaluation from action generation.
arXiv Detail & Related papers (2026-02-22T22:53:16Z) - Agentic Spatio-Temporal Grounding via Collaborative Reasoning [80.83158605034465]
Temporal Video Grounding aims to retrieve thetemporal tube of a target object or person in a video given a text query.<n>We propose the Agentic Spatio-Temporal Grounder (ASTG) framework for the task of STVG towards an open-world and training-free scenario.<n>Specifically, two specialized agents SRA (Spatial Reasoning Agent) and TRA (Temporal Reasoning Agent) constructed leveraging on modern Multimoal Large Language Models (MLLMs)<n>Experiments on popular benchmarks demonstrate the superiority of the proposed approach where it outperforms existing weakly-supervised and zero-shot approaches by a margin
arXiv Detail & Related papers (2026-02-10T10:16:27Z) - Lost in the Noise: How Reasoning Models Fail with Contextual Distractors [57.31788955167306]
Recent advances in reasoning models and agentic AI systems have led to an increased reliance on diverse external information.<n>We introduce NoisyBench, a comprehensive benchmark that systematically evaluates model robustness across 11 datasets in RAG, reasoning, alignment, and tool-use tasks.<n>Our evaluation reveals a catastrophic performance drop of up to 80% in state-of-the-art models when faced with contextual distractors.
arXiv Detail & Related papers (2026-01-12T05:43:51Z) - SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection [6.806105013817923]
SAVANT is a structured reasoning framework that achieves high accuracy and recall in detecting anomalous driving scenarios.<n>By automatically labeling over 9,640 real-world images with high accuracy, SAVANT addresses the critical data scarcity problem in anomaly detection.
arXiv Detail & Related papers (2025-10-20T19:14:29Z) - ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z) - Towards Evaluating Proactive Risk Awareness of Multimodal Language Models [38.55193215852595]
A proactive safety artificial intelligence (AI) system would work better than a reactive one.<n>PaSBench evaluates this capability through 416 multimodal scenarios.<n>Top performers like Gemini-2.5-pro achieve 71% image and 64% text accuracy, but miss 45-55% risks in repeated trials.
arXiv Detail & Related papers (2025-05-23T04:28:47Z) - Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving [55.96227460521096]
Vision-Language Models (VLMs) have been integrated into autonomous driving systems to enhance reasoning capabilities.<n>We propose a natural reflection-based backdoor attack targeting VLM systems in autonomous driving scenarios.<n>Our findings uncover a new class of attacks that exploit the stringent real-time requirements of autonomous driving.
arXiv Detail & Related papers (2025-05-09T20:28:17Z) - Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework [77.45983464131977]
We focus on how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications.<n>Our research identifies two critical latent factors affecting RAG's confidence in its predictions.<n>We develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers.
arXiv Detail & Related papers (2024-09-24T14:52:14Z) - Uncertainty-boosted Robust Video Activity Anticipation [72.14155465769201]
Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision to autonomous driving.
Despite the recent progress, the data uncertainty issue, reflected as the content evolution process and dynamic correlation in event labels, has been somehow ignored.
We propose an uncertainty-boosted robust video activity anticipation framework, which generates uncertainty values to indicate the credibility of the anticipation results.
arXiv Detail & Related papers (2024-04-29T12:31:38Z) - ADoPT: LiDAR Spoofing Attack Detection Based on Point-Level Temporal
Consistency [11.160041268858773]
Deep neural networks (DNNs) are increasingly integrated into LiDAR-based perception systems for autonomous vehicles (AVs)
We aim to address the challenge of LiDAR spoofing attacks, where attackers inject fake objects into LiDAR data and fool AVs to misinterpret their environment and make erroneous decisions.
We propose ADoPT (Anomaly Detection based on Point-level Temporal consistency), which quantitatively measures temporal consistency across consecutive frames and identifies abnormal objects based on the coherency of point clusters.
In our evaluation using the nuScenes dataset, our algorithm effectively counters various LiDAR spoofing attacks, achieving a low (
arXiv Detail & Related papers (2023-10-23T02:31:31Z) - Unsupervised Self-Driving Attention Prediction via Uncertainty Mining
and Knowledge Embedding [51.8579160500354]
We propose an unsupervised way to predict self-driving attention by uncertainty modeling and driving knowledge integration.
Results show equivalent or even more impressive performance compared to fully-supervised state-of-the-art approaches.
arXiv Detail & Related papers (2023-03-17T00:28:33Z) - Fast nonlinear risk assessment for autonomous vehicles using learned
conditional probabilistic models of agent futures [19.247932561037487]
This paper presents fast non-sampling based methods to assess the risk for trajectories of autonomous vehicles.
The presented methods address a wide range of representations for uncertain predictions including both Gaussian and non-Gaussian mixture models.
We construct deterministic linear dynamical systems that govern the exact time evolution of the moments of uncertain position.
arXiv Detail & Related papers (2021-09-21T05:55:39Z) - Fast Risk Assessment for Autonomous Vehicles Using Learned Models of
Agent Futures [10.358493658420173]
This paper presents fast non-sampling based methods to assess the risk of trajectories for autonomous vehicles.
The presented methods address a wide range of representations for uncertain predictions including both Gaussian and non-Gaussian mixture models.
The presented methods are demonstrated on realistic predictions from propagates trained on the Argoverse and CARLA datasets.
arXiv Detail & Related papers (2020-05-27T16:16:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.