Related papers: Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model

Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model

URL: http://arxiv.org/abs/2412.09647v1
Date: Wed, 11 Dec 2024 06:35:18 GMT
Title: Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model
Authors: Junqi You, Xiaosong Jia, Zhiyuan Zhang, Yutao Zhu, Junchi Yan,
Abstract summary: We introduce Bench2Drive-R, a generative framework that enables reactive closed-loop evaluation.<n>Unlike existing video generative models for autonomous driving, the proposed designs are tailored for interactive simulation.<n>We compare the generation quality of Bench2Drive-R with existing generative models and achieve state-of-the-art performance.
Score: 63.336123527432136
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: For end-to-end autonomous driving (E2E-AD), the evaluation system remains an open problem. Existing closed-loop evaluation protocols usually rely on simulators like CARLA being less realistic; while NAVSIM using real-world vision data, yet is limited to fixed planning trajectories in short horizon and assumes other agents are not reactive. We introduce Bench2Drive-R, a generative framework that enables reactive closed-loop evaluation. Unlike existing video generative models for AD, the proposed designs are tailored for interactive simulation, where sensor rendering and behavior rollout are decoupled by applying a separate behavioral controller to simulate the reactions of surrounding agents. As a result, the renderer could focus on image fidelity, control adherence, and spatial-temporal coherence. For temporal consistency, due to the step-wise interaction nature of simulation, we design a noise modulating temporal encoder with Gaussian blurring to encourage long-horizon autoregressive rollout of image sequences without deteriorating distribution shifts. For spatial consistency, a retrieval mechanism, which takes the spatially nearest images as references, is introduced to to ensure scene-level rendering fidelity during the generation process. The spatial relations between target and reference are explicitly modeled with 3D relative position encodings and the potential over-reliance of reference images is mitigated with hierarchical sampling and classifier-free guidance. We compare the generation quality of Bench2Drive-R with existing generative models and achieve state-of-the-art performance. We further integrate Bench2Drive-R into nuPlan and evaluate the generative qualities with closed-loop simulation results. We will open source our code.

Related papers

ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech. The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture. To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z)
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving [2.6749009435602122]
World models have emerged as promising neural simulators for autonomous driving.<n>We develop an open-access evaluation framework, ACT-Bench, for quantifying action fidelity.<n>We demonstrate that the state-of-the-art model does not fully adhere to given instructions, while Terra achieves improved action fidelity.
arXiv Detail & Related papers (2024-12-06T01:06:28Z)
CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration [2.400446821380503]
Image-to-point cloud registration aims to determine the relative camera pose of an RGB image with respect to a point cloud. Most learning-based methods establish 2D-3D point correspondences in feature space without any feedback mechanism for iterative optimization. We propose to reformulate the registration procedure as an iterative Markov decision process, allowing for incremental adjustments to the camera pose.
arXiv Detail & Related papers (2024-08-05T11:40:59Z)
Planning with Adaptive World Models for Autonomous Driving [50.4439896514353]
Motion planners (MPs) are crucial for safe navigation in complex urban environments. nuPlan, a recently released MP benchmark, addresses this limitation by augmenting real-world driving logs with closed-loop simulation logic. We present AdaptiveDriver, a model-predictive control (MPC) based planner that unrolls different world models conditioned on BehaviorNet's predictions.
arXiv Detail & Related papers (2024-06-15T18:53:45Z)
SceneDM: Scene-level Multi-agent Trajectory Generation with Consistent Diffusion Models [10.057312592344507]
We propose a novel framework based on diffusion models, called SceneDM, to generate joint and consistent future motions of all the agents in a scene. SceneDM achieves state-of-the-art results on the Sim Agents Benchmark.
arXiv Detail & Related papers (2023-11-27T11:39:27Z)
Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks. We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers. By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z)
Unsupervised Foggy Scene Understanding via Self Spatial-Temporal Label Diffusion [51.11295961195151]
We exploit the characteristics of the foggy image sequence of driving scenes to densify the confident pseudo labels. Based on the two discoveries of local spatial similarity and adjacent temporal correspondence of the sequential image data, we propose a novel Target-Domain driven pseudo label Diffusion scheme. Our scheme helps the adaptive model achieve 51.92% and 53.84% mean intersection-over-union (mIoU) on two publicly available natural foggy datasets.
arXiv Detail & Related papers (2022-06-10T05:16:50Z)
Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform. We produce a closed-loop controller to reactively push objects in a continuous action space. We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z)
Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties. Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates. The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.