Optimization-Guided Diffusion for Interactive Scene Generation
- URL: http://arxiv.org/abs/2512.07661v2
- Date: Thu, 11 Dec 2025 15:08:39 GMT
- Title: Optimization-Guided Diffusion for Interactive Scene Generation
- Authors: Shihao Li, Naisheng Ye, Tianyu Li, Kashyap Chitta, Tuo An, Peng Su, Boyang Wang, Haiou Liu, Chen Lv, Hongyang Li,
- Abstract summary: We present OMEGA, an optimization-guided, training-free framework that enforces structural consistency and interaction awareness during diffusion-based sampling.<n>We show that OMEGA improves generation realism, consistency, and controllability, increasing the ratio of physically and behaviorally valid scenes.<n>Our approach can also generate $5times$ more near-collision frames with a time-to-collision under three seconds.
- Score: 52.23368750264419
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Realistic and diverse multi-agent driving scenes are crucial for evaluating autonomous vehicles, but safety-critical events which are essential for this task are rare and underrepresented in driving datasets. Data-driven scene generation offers a low-cost alternative by synthesizing complex traffic behaviors from existing driving logs. However, existing models often lack controllability or yield samples that violate physical or social constraints, limiting their usability. We present OMEGA, an optimization-guided, training-free framework that enforces structural consistency and interaction awareness during diffusion-based sampling from a scene generation model. OMEGA re-anchors each reverse diffusion step via constrained optimization, steering the generation towards physically plausible and behaviorally coherent trajectories. Building on this framework, we formulate ego-attacker interactions as a game-theoretic optimization in the distribution space, approximating Nash equilibria to generate realistic, safety-critical adversarial scenarios. Experiments on nuPlan and Waymo show that OMEGA improves generation realism, consistency, and controllability, increasing the ratio of physically and behaviorally valid scenes from 32.35% to 72.27% for free exploration capabilities, and from 11% to 80% for controllability-focused generation. Our approach can also generate $5\times$ more near-collision frames with a time-to-collision under three seconds while maintaining the overall scene realism.
Related papers
- DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving [49.11389494068169]
We present DrivingGen, the first comprehensive benchmark for generative driving world models.<n>DrivingGen combines a diverse evaluation dataset curated from both driving datasets and internet-scale video sources.<n>General models look better but break physics, while driving-specific ones capture motion realistically but lag in visual quality.
arXiv Detail & Related papers (2026-01-04T13:36:21Z) - A Trajectory Generator for High-Density Traffic and Diverse Agent-Interaction Scenarios [37.38654549322757]
We propose a novel trajectory generation framework that simultaneously enhances scenarios density and enriches behavioral diversity.<n>Our method significantly improves both agent density and behavior diversity, while preserving motion realism and scenario-level safety.<n>Our synthetic data also benefits downstream trajectory prediction models and enhances performance in challenging high-density scenarios.
arXiv Detail & Related papers (2025-10-03T00:12:18Z) - World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving [1.8277374107085946]
We propose a comprehensive framework combining generative augmentation scene with adaptive temporal reasoning.<n>We develop a video generation pipeline that utilizes a world model by guided domain-informed prompts to create high-resolution, statistically consistent driving scenarios.<n>In parallel, we construct a dynamic prediction model that encodes-temporal relationships through strengthened graph convolutions and dilated temporal operators.
arXiv Detail & Related papers (2025-07-17T03:34:54Z) - RealEngine: Simulating Autonomous Driving in Realistic Context [60.55873455475112]
RealEngine is a novel driving simulation framework that holistically integrates 3D scene reconstruction and novel view synthesis techniques.<n>By leveraging real-world multi-modal sensor data, RealEngine reconstructs background scenes and foreground traffic participants separately, allowing for highly diverse and realistic traffic scenarios.<n>RealEngine supports three essential driving simulation categories: non-reactive simulation, safety testing, and multi-agent interaction.
arXiv Detail & Related papers (2025-05-22T17:01:00Z) - Safety-Critical Traffic Simulation with Guided Latent Diffusion Model [8.011306318131458]
Safety-critical traffic simulation plays a crucial role in evaluating autonomous driving systems.<n>We propose a guided latent diffusion model (LDM) capable of generating physically realistic and adversarial scenarios.<n>Our work provides an effective tool for realistic safety-critical scenario simulation, paving the way for more robust evaluation of autonomous driving systems.
arXiv Detail & Related papers (2025-05-01T13:33:34Z) - From Imitation to Exploration: End-to-end Autonomous Driving based on World Model [24.578178308010912]
RAMBLE is an end-to-end world model-based RL method for driving decision-making.<n>It can handle complex and dynamic traffic scenarios.<n>It achieves state-of-the-art performance in route completion rate on the CARLA Leaderboard 1.0 and completes all 38 scenarios on the CARLA Leaderboard 2.0.
arXiv Detail & Related papers (2024-10-03T06:45:59Z) - Adversarial Safety-Critical Scenario Generation using Naturalistic Human Driving Priors [2.773055342671194]
We introduce a natural adversarial scenario generation solution using naturalistic human driving priors and reinforcement learning techniques.
Our findings demonstrate that the proposed model can generate realistic safety-critical test scenarios covering both naturalness and adversariality.
arXiv Detail & Related papers (2024-08-06T13:58:56Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors [74.67698916175614]
We propose TrafficSim, a multi-agent behavior model for realistic traffic simulation.
In particular, we leverage an implicit latent variable model to parameterize a joint actor policy.
We show TrafficSim generates significantly more realistic and diverse traffic scenarios as compared to a diverse set of baselines.
arXiv Detail & Related papers (2021-01-17T00:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.