From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios
- URL: http://arxiv.org/abs/2502.02145v3
- Date: Wed, 21 May 2025 07:47:01 GMT
- Title: From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios
- Authors: Yuan Gao, Mattia Piccinini, Korbinian Moller, Amr Alanwar, Johannes Betz,
- Abstract summary: Large Language Models (LLMs) and structured scenario parsing and prompt engineering are used to generate safety-critical driving scenarios.<n>We validate our approach using a 2D simulation framework and multiple pre-trained LLMs.<n>We conclude that an LLM equipped with domain-informed prompting techniques can effectively evaluate and generate safety-critical driving scenarios.
- Score: 6.681744368557208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensuring the safety of autonomous vehicles requires virtual scenario-based testing, which depends on the robust evaluation and generation of safety-critical scenarios. So far, researchers have used scenario-based testing frameworks that rely heavily on handcrafted scenarios as safety metrics. To reduce the effort of human interpretation and overcome the limited scalability of these approaches, we combine Large Language Models (LLMs) with structured scenario parsing and prompt engineering to automatically evaluate and generate safety-critical driving scenarios. We introduce Cartesian and Ego-centric prompt strategies for scenario evaluation, and an adversarial generation module that modifies trajectories of risk-inducing vehicles (ego-attackers) to create critical scenarios. We validate our approach using a 2D simulation framework and multiple pre-trained LLMs. The results show that the evaluation module effectively detects collision scenarios and infers scenario safety. Meanwhile, the new generation module identifies high-risk agents and synthesizes realistic, safety-critical scenarios. We conclude that an LLM equipped with domain-informed prompting techniques can effectively evaluate and generate safety-critical driving scenarios, reducing dependence on handcrafted metrics. We release our open-source code and scenarios at: https://github.com/TUM-AVS/From-Words-to-Collisions.
Related papers
- LLM-based Realistic Safety-Critical Driving Video Generation [4.537331974356809]
We propose a framework that automatically synthesizes driving scenarios within the CARLA simulator.<n>The framework has flexibility in scenario scripting, efficient code-based control of traffic participants, and enforcement of realistic physical dynamics.<n>Our approach enables controllable scenario generation and facilitates the creation of rare but critical edge cases.
arXiv Detail & Related papers (2025-07-02T00:45:19Z) - ROSE: Toward Reality-Oriented Safety Evaluation of Large Language Models [60.28667314609623]
Large Language Models (LLMs) are increasingly deployed as black-box components in real-world applications.<n>We propose Reality-Oriented Safety Evaluation (ROSE), a novel framework that uses multi-objective reinforcement learning to fine-tune an adversarial LLM.
arXiv Detail & Related papers (2025-06-17T10:55:17Z) - SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator [77.86600052899156]
Large Language Model (LLM)-based agents are increasingly deployed in real-world applications.<n>We propose AutoSafe, the first framework that systematically enhances agent safety through fully automated synthetic data generation.<n>We show that AutoSafe boosts safety scores by 45% on average and achieves a 28.91% improvement on real-world tasks.
arXiv Detail & Related papers (2025-05-23T10:56:06Z) - Seeking to Collide: Online Safety-Critical Scenario Generation for Autonomous Driving with Retrieval Augmented Large Language Models [39.139025989575686]
We introduce an online, retrieval-augmented large language model (LLM) framework for generating safety-critical driving scenarios.<n>Our model reduces the mean minimum time-to-collision from 1.62 to 1.08 s and incurs a 75% collision rate, substantially outperforming baselines.
arXiv Detail & Related papers (2025-05-02T03:22:00Z) - LLM-Safety Evaluations Lack Robustness [58.334290876531036]
We argue that current safety alignment research efforts for large language models are hindered by many intertwined sources of noise.
We propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers.
arXiv Detail & Related papers (2025-03-04T12:55:07Z) - TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning [50.86790793565535]
TeLL-Drive is a hybrid framework that integrates an Teacher LLM to guide an attention-based Student DRL policy.<n>A self-attention mechanism then fuses these strategies with the DRL agent's exploration, accelerating policy convergence and boosting robustness across diverse driving conditions.
arXiv Detail & Related papers (2025-02-03T14:22:03Z) - Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model [10.741225574706]
AutoScenario is a framework for realistic corner case generation.<n>It converts safety-critical real-world data from multiple sources into textual representations.<n>It integrates tools from the Simulation of Urban Mobility (SUMO) and CARLA simulators.
arXiv Detail & Related papers (2024-11-29T20:23:28Z) - Generating Out-Of-Distribution Scenarios Using Language Models [58.47597351184034]
Large Language Models (LLMs) have shown promise in autonomous driving.
This paper introduces a framework for generating diverse Out-Of-Distribution (OOD) driving scenarios.
We evaluate our framework through extensive simulations and introduce a new "OOD-ness" metric.
arXiv Detail & Related papers (2024-11-25T16:38:17Z) - Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles [6.836108615628114]
We propose OmniTester: a framework that generates realistic and diverse scenarios within a simulation environment.
In the experiments, we demonstrated the controllability and realism of our approaches in generating three types of challenging and complex scenarios.
arXiv Detail & Related papers (2024-09-10T12:12:09Z) - ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles [17.396416459648755]
ChatScene is a Large Language Model (LLM)-based agent that generates safety-critical scenarios for autonomous vehicles.
A key part of our agent is a comprehensive knowledge retrieval component, which efficiently translates specific textual descriptions into corresponding domain-specific code snippets.
arXiv Detail & Related papers (2024-05-22T23:21:15Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z) - Generating and Characterizing Scenarios for Safety Testing of Autonomous
Vehicles [86.9067793493874]
We propose efficient mechanisms to characterize and generate testing scenarios using a state-of-the-art driving simulator.
We use our method to characterize real driving data from the Next Generation Simulation (NGSIM) project.
We rank the scenarios by defining metrics based on the complexity of avoiding accidents and provide insights into how the AV could have minimized the probability of incurring an accident.
arXiv Detail & Related papers (2021-03-12T17:00:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.