Offline Reinforcement Learning for End-to-End Autonomous Driving
- URL: http://arxiv.org/abs/2512.18662v1
- Date: Sun, 21 Dec 2025 09:21:04 GMT
- Title: Offline Reinforcement Learning for End-to-End Autonomous Driving
- Authors: Chihiro Noguchi, Takaki Yamamoto,
- Abstract summary: End-to-end (E2E) autonomous driving models take only camera images as input and directly predict a future trajectory.<n>Online reinforcement learning (RL) could mitigate IL-induced issues.<n>We introduce a camera-only E2E offline RL framework that performs no additional exploration and trains solely on a fixed simulator dataset.
- Score: 1.2891210250935148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end (E2E) autonomous driving models that take only camera images as input and directly predict a future trajectory are appealing for their computational efficiency and potential for improved generalization via unified optimization; however, persistent failure modes remain due to reliance on imitation learning (IL). While online reinforcement learning (RL) could mitigate IL-induced issues, the computational burden of neural rendering-based simulation and large E2E networks renders iterative reward and hyperparameter tuning costly. We introduce a camera-only E2E offline RL framework that performs no additional exploration and trains solely on a fixed simulator dataset. Offline RL offers strong data efficiency and rapid experimental iteration, yet is susceptible to instability from overestimation on out-of-distribution (OOD) actions. To address this, we construct pseudo ground-truth trajectories from expert driving logs and use them as a behavior regularization signal, suppressing imitation of unsafe or suboptimal behavior while stabilizing value learning. Training and closed-loop evaluation are conducted in a neural rendering environment learned from the public nuScenes dataset. Empirically, the proposed method achieves substantial improvements in collision rate and route completion compared with IL baselines. Our code will be available at [URL].
Related papers
- When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training [58.25341036646294]
We analytically examine why learning recurrent poles does not provide tangible benefits in data and empirically offer real-time learning scenarios.<n>We show that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.
arXiv Detail & Related papers (2026-02-25T00:15:13Z) - Human-in-the-loop Online Rejection Sampling for Robotic Manipulation [55.99788088622936]
Hi-ORS stabilizes value estimation by filtering out negatively rewarded samples during online fine-tuning.<n>Hi-ORS fine-tunes a pi-base policy to master contact-rich manipulation in just 1.5 hours of real-world training.
arXiv Detail & Related papers (2025-10-30T11:53:08Z) - ZTRS: Zero-Imitation End-to-end Autonomous Driving with Trajectory Scoring [52.195295396336526]
ZTRS (Zero-Imitation End-to-End Autonomous Driving with Trajectory Scoring) is a framework that combines the strengths of both worlds: sensor inputs without losing information and RL training for robust planning.<n>ZTRS demonstrates strong performance across three benchmarks: Navtest, Navhard, and HUGSIM.
arXiv Detail & Related papers (2025-10-28T06:26:36Z) - ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z) - First Order Model-Based RL through Decoupled Backpropagation [10.963895023346879]
We propose an approach that decouples trajectory generation from gradient computation.<n>Our method achieves the sample efficiency and speed of specialized locomotions such as SHAC.<n>We empirically validate our gradient algorithm on benchmark control tasks and demonstrate its effectiveness on a real Go2 quadruped robot.
arXiv Detail & Related papers (2025-08-29T19:55:25Z) - RIFT: Group-Relative RL Fine-Tuning for Realistic and Controllable Traffic Simulation [13.319344167881383]
We introduce a dual-stage AV-centric simulation framework that conducts imitation learning pre-training in a data-driven simulator.<n>We then learn fine-tuning in a physics-based simulator to enhance style-level controllability.<n>In the fine-tuning stage, we propose RIFT, a novel group-relative RL fine-tuning strategy.
arXiv Detail & Related papers (2025-05-06T09:12:37Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [57.278726604424556]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.<n>Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.<n>We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Improving Offline Reinforcement Learning with Inaccurate Simulators [34.54402525918925]
We propose a novel approach to combine the offline dataset and the inaccurate simulation data in a better manner.
Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset.
Our experimental results in the D4RL benchmark and a real-world manipulation task confirm that our method can benefit more from both inaccurate simulator and limited offline datasets to achieve better performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-05-07T13:29:41Z) - Offline Trajectory Optimization for Offline Reinforcement Learning [42.306438854850434]
offline reinforcement learning aims to learn policies without online explorations.<n>Existing data augmentation methods for offline RL suffer from (i) trivial improvement from short-horizon simulation.<n>We propose offline trajectory optimization for offline reinforcement learning (OTTO)
arXiv Detail & Related papers (2024-04-16T08:48:46Z) - Adaptive Behavior Cloning Regularization for Stable Offline-to-Online
Reinforcement Learning [80.25648265273155]
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment.
During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data.
We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability.
Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
arXiv Detail & Related papers (2022-10-25T09:08:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.