Related papers: RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

URL: http://arxiv.org/abs/2502.13144v2
Date: Tue, 21 Oct 2025 03:19:21 GMT
Title: RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
Authors: Hao Gao, Shaoyu Chen, Bo Jiang, Bencheng Liao, Yiang Shi, Xiaoyang Guo, Yuechuan Pu, Haoran Yin, Xiangyu Li, Xinbang Zhang, Ying Zhang, Wenyu Liu, Qian Zhang, Xinggang Wang,
Abstract summary: We propose RAD, a 3DGS-based closed-loop Reinforcement Learning framework for end-to-end Autonomous Driving.<n>To enhance safety, we design specialized rewards to guide the policy in effectively responding to safety-critical events and understanding real-world causal relationships.<n>Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, particularly exhibiting a 3x lower collision rate.
Score: 54.52545900359868
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and an open-loop gap. In this work, we propose RAD, a 3DGS-based closed-loop Reinforcement Learning (RL) framework for end-to-end Autonomous Driving. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards to guide the policy in effectively responding to safety-critical events and understanding real-world causal relationships. To better align with human driving behavior, we incorporate IL into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, particularly exhibiting a 3x lower collision rate. Abundant closed-loop results are presented in the supplementary material. Code is available at https://github.com/hustvl/RAD for facilitating future research.

Related papers

ZTRS: Zero-Imitation End-to-end Autonomous Driving with Trajectory Scoring [52.195295396336526]
ZTRS (Zero-Imitation End-to-End Autonomous Driving with Trajectory Scoring) is a framework that combines the strengths of both worlds: sensor inputs without losing information and RL training for robust planning.<n>ZTRS demonstrates strong performance across three benchmarks: Navtest, Navhard, and HUGSIM.
arXiv Detail & Related papers (2025-10-28T06:26:36Z)
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving [49.07731497951963]
ReCogDrive is a novel Reinforced Cognitive framework for end-to-end autonomous driving.<n>We introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers.<n>We then address the language-action mismatch by injecting the VLM's learned driving priors into a diffusion planner.
arXiv Detail & Related papers (2025-06-09T03:14:04Z)
From Imitation to Exploration: End-to-end Autonomous Driving based on World Model [24.578178308010912]
RAMBLE is an end-to-end world model-based RL method for driving decision-making.<n>It can handle complex and dynamic traffic scenarios.<n>It achieves state-of-the-art performance in route completion rate on the CARLA Leaderboard 1.0 and completes all 38 scenarios on the CARLA Leaderboard 2.0.
arXiv Detail & Related papers (2024-10-03T06:45:59Z)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
ReGentS: Real-World Safety-Critical Driving Scenario Generation Made Stable [88.08120417169971]
Machine learning based autonomous driving systems often face challenges with safety-critical scenarios that are rare in real-world data. This work explores generating safety-critical driving scenarios by modifying complex real-world regular scenarios through trajectory optimization. Our approach addresses unrealistic diverging trajectories and unavoidable collision scenarios that are not useful for training robust planner.
arXiv Detail & Related papers (2024-09-12T08:26:33Z)
CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving [45.05135725542318]
IMitation and Reinforcement Learning (CIMRL) approach enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. By combining RL and imitation, we demonstrate our method achieves state-of-the-art results in closed loop simulation and real world driving benchmarks.
arXiv Detail & Related papers (2024-06-13T07:31:29Z)
Demystifying the Physics of Deep Reinforcement Learning-Based Autonomous Vehicle Decision-Making [6.243971093896272]
We use a continuous proximal policy optimization-based DRL algorithm as the baseline model and add a multi-head attention framework in an open-source AV simulation environment. We show that the weights in the first head encode the positions of the neighboring vehicles while the second head focuses on the leader vehicle exclusively.
arXiv Detail & Related papers (2024-03-18T02:59:13Z)
Learning Realistic Traffic Agents in Closed-loop [36.38063449192355]
Reinforcement learning (RL) can train traffic agents to avoid infractions, but using RL alone results in unhuman-like driving behaviors. We propose Reinforcing Traffic Rules (RTR) to match expert demonstrations under a traffic compliance constraint. Our experiments show that RTR learns more realistic and generalizable traffic simulation policies.
arXiv Detail & Related papers (2023-11-02T16:55:23Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Carl-Lead: Lidar-based End-to-End Autonomous Driving with Contrastive Deep Reinforcement Learning [10.040113551761792]
We use deep reinforcement learning (DRL) to train lidar-based end-to-end driving policies. In this work, we use DRL to train lidar-based end-to-end driving policies that naturally consider imperfect partial observations. Our method achieves higher success rates than the state-of-the-art (SOTA) lidar-based end-to-end driving network.
arXiv Detail & Related papers (2021-09-17T11:24:10Z)
CLAMGen: Closed-Loop Arm Motion Generation via Multi-view Vision-Based RL [4.014524824655106]
We propose a vision-based reinforcement learning (RL) approach for closed-loop trajectory generation in an arm reaching problem. Arm trajectory generation is a fundamental robotics problem which entails finding collision-free paths to move the robot's body.
arXiv Detail & Related papers (2021-03-24T15:33:03Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
Integrating Deep Reinforcement Learning with Model-based Path Planners for Automated Driving [0.0]
We propose a hybrid approach for integrating a path planning pipe into a vision based DRL framework. In summary, the DRL agent is trained to follow the path planner's waypoints as close as possible. Experimental results show that the proposed method can plan its path and navigate between randomly chosen origin-destination points.
arXiv Detail & Related papers (2020-02-02T17:10:19Z)
Intelligent Roundabout Insertion using Deep Reinforcement Learning [68.8204255655161]
We present a maneuver planning module able to negotiate the entering in busy roundabouts. The proposed module is based on a neural network trained to predict when and how entering the roundabout throughout the whole duration of the maneuver.
arXiv Detail & Related papers (2020-01-03T11:16:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.