Related papers: RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies

RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies

URL: http://arxiv.org/abs/2512.01993v1
Date: Mon, 01 Dec 2025 18:52:03 GMT
Title: RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies
Authors: Guillermo Garcia-Cobo, Maximilian Igl, Peter Karkus, Zhejun Zhang, Michael Watson, Yuxiao Chen, Boris Ivanovic, Marco Pavone,
Abstract summary: Rollouts as Demonstrations (RoaD) is a method to mitigate covariate shift when training autonomous driving policies in closed loop.<n>During rollout generation, RoaD incorporates expert guidance to bias trajectories toward high-quality behavior, producing informative yet realistic demonstrations for fine-tuning.<n>We demonstrate the effectiveness of RoaD on WOSAC, a large-scale traffic simulation benchmark, where it performs similar or better than the prior CL-SFT method.
Score: 30.632104005565832
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Autonomous driving policies are typically trained via open-loop behavior cloning of human demonstrations. However, such policies suffer from covariate shift when deployed in closed loop, leading to compounding errors. We introduce Rollouts as Demonstrations (RoaD), a simple and efficient method to mitigate covariate shift by leveraging the policy's own closed-loop rollouts as additional training data. During rollout generation, RoaD incorporates expert guidance to bias trajectories toward high-quality behavior, producing informative yet realistic demonstrations for fine-tuning. This approach enables robust closed-loop adaptation with orders of magnitude less data than reinforcement learning, and avoids restrictive assumptions of prior closed-loop supervised fine-tuning (CL-SFT) methods, allowing broader applications domains including end-to-end driving. We demonstrate the effectiveness of RoaD on WOSAC, a large-scale traffic simulation benchmark, where it performs similar or better than the prior CL-SFT method; and in AlpaSim, a high-fidelity neural reconstruction-based simulator for end-to-end driving, where it improves driving score by 41\% and reduces collisions by 54\%.

Related papers

TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data [40.3157492247442]
Existing end-to-end autonomous driving methods typically rely on imitation learning (IL)<n>This misalignment often triggers driver-initiated takeovers and system disengagements during closed-loop execution.<n>We propose TakeAD, a preference-based post-optimization framework that fine-tunes the pre-trained IL policy with this disengagement data.
arXiv Detail & Related papers (2025-12-19T09:12:44Z)
Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving [54.46325690390831]
We propose Model-based Policy Adaptation (MPA), a general framework that enhances the robustness and safety of pretrained E2E driving agents during deployment.<n>MPA first generates diverse counterfactual trajectories using a geometry-consistent simulation engine.<n>MPA trains a diffusion-based policy adapter to refine the base policy's predictions and a multi-step Q value model to evaluate long-term outcomes.
arXiv Detail & Related papers (2025-11-26T17:01:41Z)
Iterative Refinement of Flow Policies in Probability Space for Online Reinforcement Learning [56.47948583452555]
We introduce the Stepwise Flow Policy (SWFP) framework, founded on the key insight that discretizing the flow matching inference process via a fixed-step Euler scheme aligns it with the variational Jordan-Kinderlehrer-Otto principle from optimal transport.<n>SWFP decomposes the global flow into a sequence of small, incremental transformations between proximate distributions.<n>This decomposition yields an efficient algorithm that fine-tunes pre-trained flows via a cascade of small flow blocks, offering significant advantages.
arXiv Detail & Related papers (2025-10-17T07:43:51Z)
Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training [64.16445087751039]
Hydra-NeXt is a novel multi-branch planning framework that unifies trajectory prediction, control prediction, and a trajectory refinement network in one model.<n> Hydra-NeXt surpasses the previous state-of-the-art by 22.98 DS and 17.49 SR, marking a significant advancement in autonomous driving.
arXiv Detail & Related papers (2025-03-15T07:42:27Z)
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning [54.52545900359868]
We propose RAD, a 3DGS-based closed-loop Reinforcement Learning framework for end-to-end Autonomous Driving.<n>To enhance safety, we design specialized rewards to guide the policy in effectively responding to safety-critical events and understanding real-world causal relationships.<n>Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, particularly exhibiting a 3x lower collision rate.
arXiv Detail & Related papers (2025-02-18T18:59:21Z)
Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models [32.51871127681948]
tokenized multi-agent policies have recently become the state-of-the-art in traffic simulation.<n>They are typically trained through open-loop behavior cloning.<n>We present Closest Among Top-K (CAT-K) rollouts, a simple yet effective closed-loop fine-tuning strategy.
arXiv Detail & Related papers (2024-12-05T21:00:21Z)
SoftCTRL: Soft conservative KL-control of Transformer Reinforcement Learning for Autonomous Driving [0.6906005491572401]
We introduce a method that combines IL with Reinforcement learning (RL) using an implicit entropy-KL control that offers a simple way to reduce the over-conservation characteristic. In particular, we validate different challenging simulated urban scenarios from the unseen dataset, indicating that although IL can perform well in imitation tasks, our proposed method significantly improves robustness (over 17% reduction in failures) and generates human-like driving behavior.
arXiv Detail & Related papers (2024-10-30T07:18:00Z)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [57.278726604424556]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.<n>Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.<n>We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
A Tricycle Model to Accurately Control an Autonomous Racecar with Locked Differential [71.53284767149685]
We present a novel formulation to model the effects of a locked differential on the lateral dynamics of an autonomous open-wheel racecar. We include a micro-steps discretization approach to accurately linearize the dynamics and produce a prediction suitable for real-time implementation.
arXiv Detail & Related papers (2023-12-22T16:29:55Z)
Bi-Level Optimization Augmented with Conditional Variational Autoencoder for Autonomous Driving in Dense Traffic [0.9281671380673306]
This paper presents a parameterized bi-level optimization that jointly computes the optimal behavioural decisions and the resulting trajectory. Our approach runs in real-time using a custom GPU-accelerated batch, and a Variational Autoencoder learnt warm-start strategy. Our approach outperforms state-of-the-art model predictive control and RL approaches in terms of collision rate while being competitive in driving efficiency.
arXiv Detail & Related papers (2022-12-05T12:56:42Z)
Carl-Lead: Lidar-based End-to-End Autonomous Driving with Contrastive Deep Reinforcement Learning [10.040113551761792]
We use deep reinforcement learning (DRL) to train lidar-based end-to-end driving policies. In this work, we use DRL to train lidar-based end-to-end driving policies that naturally consider imperfect partial observations. Our method achieves higher success rates than the state-of-the-art (SOTA) lidar-based end-to-end driving network.
arXiv Detail & Related papers (2021-09-17T11:24:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.