Related papers: Bootstrapping Reinforcement Learning with Sub-optimal Policies for Autonomous Driving

Bootstrapping Reinforcement Learning with Sub-optimal Policies for Autonomous Driving

URL: http://arxiv.org/abs/2509.04712v1
Date: Thu, 04 Sep 2025 23:56:26 GMT
Title: Bootstrapping Reinforcement Learning with Sub-optimal Policies for Autonomous Driving
Authors: Zhihao Zhang, Chengyang Peng, Ekim Yurtsever, Keith A. Redmill,
Abstract summary: We propose guiding the RL driving agent with a demonstration policy that need not be a highly optimized or expert-level controller.<n>We integrate a rule-based lane change controller with the Soft Actor Critic (SAC) algorithm to enhance exploration and learning efficiency.<n>Our approach demonstrates improved driving performance and can be extended to other driving scenarios that can similarly benefit from demonstration-based guidance.
Score: 4.74407831153952
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated vehicle control using reinforcement learning (RL) has attracted significant attention due to its potential to learn driving policies through environment interaction. However, RL agents often face training challenges in sample efficiency and effective exploration, making it difficult to discover an optimal driving strategy. To address these issues, we propose guiding the RL driving agent with a demonstration policy that need not be a highly optimized or expert-level controller. Specifically, we integrate a rule-based lane change controller with the Soft Actor Critic (SAC) algorithm to enhance exploration and learning efficiency. Our approach demonstrates improved driving performance and can be extended to other driving scenarios that can similarly benefit from demonstration-based guidance.

Related papers

TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning [61.33599727106222]
TeLL-Drive is a hybrid framework that integrates a Teacher LLM to guide an attention-based Student DRL policy.<n>A self-attention mechanism then fuses these strategies with the DRL agent's exploration, accelerating policy convergence and boosting robustness.
arXiv Detail & Related papers (2025-02-03T14:22:03Z)
Robust Driving Policy Learning with Guided Meta Reinforcement Learning [49.860391298275616]
We introduce an efficient method to train diverse driving policies for social vehicles as a single meta-policy. By randomizing the interaction-based reward functions of social vehicles, we can generate diverse objectives and efficiently train the meta-policy. We propose a training strategy to enhance the robustness of the ego vehicle's driving policy using the environment where social vehicles are controlled by the learned meta-policy.
arXiv Detail & Related papers (2023-07-19T17:42:36Z)
Comprehensive Training and Evaluation on Deep Reinforcement Learning for Automated Driving in Various Simulated Driving Maneuvers [0.4241054493737716]
This study implements, evaluating, and comparing the two DRL algorithms, Deep Q-networks (DQN) and Trust Region Policy Optimization (TRPO) Models trained on the designed ComplexRoads environment can adapt well to other driving maneuvers with promising overall performance.
arXiv Detail & Related papers (2023-06-20T11:41:01Z)
Risk-Aware Reward Shaping of Reinforcement Learning Agents for Autonomous Driving [6.613838702441967]
This paper investigates how to use risk-aware reward shaping to leverage the training and test performance of RL agents in autonomous driving. We propose additional reshaped reward terms that encourage exploration and penalize risky driving behaviors.
arXiv Detail & Related papers (2023-06-05T20:10:36Z)
FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing [71.76084256567599]
We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL) Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations. The resulting policies exhibit emergent aggressive driving skills, such as timing braking and acceleration around turns and avoiding areas which impede the robot's motion, approaching the performance of a human driver using a similar first-person interface over the course of training.
arXiv Detail & Related papers (2023-04-19T17:33:47Z)
Unified Automatic Control of Vehicular Systems with Reinforcement Learning [64.63619662693068]
This article contributes a streamlined methodology for vehicular microsimulation. It discovers high performance control strategies with minimal manual design. The study reveals numerous emergent behaviors resembling wave mitigation, traffic signaling, and ramp metering.
arXiv Detail & Related papers (2022-07-30T16:23:45Z)
Learning energy-efficient driving behaviors by imitating experts [75.12960180185105]
This paper examines the role of imitation learning in bridging the gap between control strategies and realistic limitations in communication and sensing. We show that imitation learning can succeed in deriving policies that, if adopted by 5% of vehicles, may boost the energy-efficiency of networks with varying traffic conditions by 15% using only local observations.
arXiv Detail & Related papers (2022-06-28T17:08:31Z)
DriverGym: Democratising Reinforcement Learning for Autonomous Driving [75.91049219123899]
We propose DriverGym, an open-source environment for developing reinforcement learning algorithms for autonomous driving. DriverGym provides access to more than 1000 hours of expert logged data and also supports reactive and data-driven agent behavior. The performance of an RL policy can be easily validated on real-world data using our extensive and flexible closed-loop evaluation protocol.
arXiv Detail & Related papers (2021-11-12T11:47:08Z)
Affordance-based Reinforcement Learning for Urban Driving [3.507764811554557]
We propose a deep reinforcement learning framework to learn optimal control policy using waypoints and low-dimensional visual representations. We demonstrate that our agents when trained from scratch learn the tasks of lane-following, driving around inter-sections as well as stopping in front of other actors or traffic lights even in the dense traffic setting.
arXiv Detail & Related papers (2021-01-15T05:21:25Z)
Decision-making for Autonomous Vehicles on Highway: Deep Reinforcement Learning with Continuous Action Horizon [14.059728921828938]
This paper utilizes the deep reinforcement learning (DRL) method to address the continuous-horizon decision-making problem on the highway. The running objective of the ego automated vehicle is to execute an efficient and smooth policy without collision. The PPO-DRL-based decision-making strategy is estimated from multiple perspectives, including the optimality, learning efficiency, and adaptability.
arXiv Detail & Related papers (2020-08-26T22:49:27Z)
Automated Lane Change Strategy using Proximal Policy Optimization-based Deep Reinforcement Learning [10.909595997847443]
Lane-change maneuvers are commonly executed by drivers to follow a certain routing plan, overtake a slower vehicle, adapt to a merging lane ahead, etc. In this study, we propose an automated lane change strategy using proximal policy optimization-based deep reinforcement learning. The trained agent is able to learn a smooth, safe, and efficient driving policy to make lane-change decisions.
arXiv Detail & Related papers (2020-02-07T08:43:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.