Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models
- URL: http://arxiv.org/abs/2412.05334v2
- Date: Fri, 14 Mar 2025 09:11:40 GMT
- Title: Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models
- Authors: Zhejun Zhang, Peter Karkus, Maximilian Igl, Wenhao Ding, Yuxiao Chen, Boris Ivanovic, Marco Pavone,
- Abstract summary: tokenized multi-agent policies have recently become the state-of-the-art in traffic simulation.<n>They are typically trained through open-loop behavior cloning.<n>We present Closest Among Top-K (CAT-K) rollouts, a simple yet effective closed-loop fine-tuning strategy.
- Score: 32.51871127681948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traffic simulation aims to learn a policy for traffic agents that, when unrolled in closed-loop, faithfully recovers the joint distribution of trajectories observed in the real world. Inspired by large language models, tokenized multi-agent policies have recently become the state-of-the-art in traffic simulation. However, they are typically trained through open-loop behavior cloning, and thus suffer from covariate shift when executed in closed-loop during simulation. In this work, we present Closest Among Top-K (CAT-K) rollouts, a simple yet effective closed-loop fine-tuning strategy to mitigate covariate shift. CAT-K fine-tuning only requires existing trajectory data, without reinforcement learning or generative adversarial imitation. Concretely, CAT-K fine-tuning enables a small 7M-parameter tokenized traffic simulation policy to outperform a 102M-parameter model from the same model family, achieving the top spot on the Waymo Sim Agent Challenge leaderboard at the time of submission. The code is available at https://github.com/NVlabs/catk.
Related papers
- RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies [30.632104005565832]
Rollouts as Demonstrations (RoaD) is a method to mitigate covariate shift when training autonomous driving policies in closed loop.<n>During rollout generation, RoaD incorporates expert guidance to bias trajectories toward high-quality behavior, producing informative yet realistic demonstrations for fine-tuning.<n>We demonstrate the effectiveness of RoaD on WOSAC, a large-scale traffic simulation benchmark, where it performs similar or better than the prior CL-SFT method.
arXiv Detail & Related papers (2025-12-01T18:52:03Z) - SPACeR: Self-Play Anchoring with Centralized Reference Models [50.55045557371374]
Sim agent policies are realistic, human-like, fast, and scalable in multi-agent settings.<n>Recent progress in imitation learning with large diffusion-based or tokenized models has shown that behaviors can be captured directly from human driving data.<n>We propose SPACeR, a framework that leverages a pretrained tokenized autoregressive motion model as a central reference policy.
arXiv Detail & Related papers (2025-10-20T19:53:02Z) - IntTrajSim: Trajectory Prediction for Simulating Multi-Vehicle driving at Signalized Intersections [8.484294935626224]
Traffic simulators are widely used to study the operational efficiency of road infrastructure.<n>Their rule-based approach limits their ability to mimic real-world driving behavior.<n>We propose traffic engineering-related metrics to evaluate generative trajectory prediction models.
arXiv Detail & Related papers (2025-06-10T16:27:42Z) - Rolling Ahead Diffusion for Traffic Scene Simulation [13.900806577888861]
Realistic driving simulation requires that NPCs not only mimic natural driving behaviors but also react to the behavior of other simulated agents.
Recent developments in diffusion-based scenario generation focus on creating diverse and realistic traffic scenarios.
We present a rolling diffusion based traffic scene generation model which mixes the benefits of both methods.
arXiv Detail & Related papers (2025-02-13T18:45:56Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking [65.24988062003096]
We present NAVSIM, a framework for benchmarking vision-based driving policies.
Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other.
NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights.
arXiv Detail & Related papers (2024-06-21T17:59:02Z) - BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction [22.254486248785614]
BehaviorGPT is a homogeneous and fully autoregressive Transformer designed to simulate the sequential behavior of multiple agents.
We introduce the Next-Patch Prediction Paradigm (NP3) to mitigate the negative effects of autoregressive modeling.
BehaviorGPT won first place in the 2024 Open Sim Agents Challenge with a realism score of 0.7473 and a minADE score of 1.4147.
arXiv Detail & Related papers (2024-05-27T17:28:25Z) - Tractable Joint Prediction and Planning over Discrete Behavior Modes for
Urban Driving [15.671811785579118]
We show that we can parameterize autoregressive closed-loop models without retraining.
We propose fully reactive closed-loop planning over discrete latent modes.
Our approach also outperforms the previous state-of-the-art in CARLA on challenging dense traffic scenarios.
arXiv Detail & Related papers (2024-03-12T01:00:52Z) - SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for
Autonomous Driving [27.776472262857045]
This paper presents a Simple and effIcient Motion Prediction baseLine (SIMPL) for autonomous vehicles.
We propose a compact and efficient global feature fusion module that performs directed message passing in a symmetric manner.
As a strong baseline, SIMPL exhibits highly competitive performance on Argoverse 1 & 2 motion forecasting benchmarks.
arXiv Detail & Related papers (2024-02-04T15:07:49Z) - CAT: Closed-loop Adversarial Training for Safe End-to-End Driving [54.60865656161679]
Adversarial Training (CAT) is a framework for safe end-to-end driving in autonomous vehicles.
Cat aims to continuously improve the safety of driving agents by training the agent on safety-critical scenarios.
Cat can effectively generate adversarial scenarios countering the agent being trained.
arXiv Detail & Related papers (2023-10-19T02:49:31Z) - AdaCat: Adaptive Categorical Discretization for Autoregressive Models [84.85102013917606]
We propose an efficient, expressive, multimodal parameterization called Adaptive Categorical Discretization (AdaCat)
AdaCat discretizes each dimension of an autoregressive model adaptively, which allows the model to allocate density to fine intervals of interest.
arXiv Detail & Related papers (2022-08-03T17:53:46Z) - TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors [74.67698916175614]
We propose TrafficSim, a multi-agent behavior model for realistic traffic simulation.
In particular, we leverage an implicit latent variable model to parameterize a joint actor policy.
We show TrafficSim generates significantly more realistic and diverse traffic scenarios as compared to a diverse set of baselines.
arXiv Detail & Related papers (2021-01-17T00:29:30Z) - Learning from Simulation, Racing in Reality [126.56346065780895]
We present a reinforcement learning-based solution to autonomously race on a miniature race car platform.
We show that a policy that is trained purely in simulation can be successfully transferred to the real robotic setup.
arXiv Detail & Related papers (2020-11-26T14:58:49Z) - Integrating Deep Reinforcement Learning with Model-based Path Planners
for Automated Driving [0.0]
We propose a hybrid approach for integrating a path planning pipe into a vision based DRL framework.
In summary, the DRL agent is trained to follow the path planner's waypoints as close as possible.
Experimental results show that the proposed method can plan its path and navigate between randomly chosen origin-destination points.
arXiv Detail & Related papers (2020-02-02T17:10:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.