From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning
- URL: http://arxiv.org/abs/2510.06038v1
- Date: Tue, 07 Oct 2025 15:33:29 GMT
- Title: From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning
- Authors: Li Zeqiao, Wang Yijing, Wang Haoyu, Li Zheng, Li Peng, Liu Wenfei, Zuo Zhiqiang,
- Abstract summary: We propose a reward-free, active human-in-the-loop learning method called Human-Guided Distributional Soft Actor-Critic (H-DSAC)<n>Our method combines Proxy Value Propagation (PVP) and Distributional Soft Actor-Critic (DSAC) to enable efficient and safe training in real-world environments.
- Score: 2.9019234287347864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous driving with reinforcement learning (RL) has significant potential. However, applying RL in real-world settings remains challenging due to the need for safe, efficient, and robust learning. Incorporating human expertise into the learning process can help overcome these challenges by reducing risky exploration and improving sample efficiency. In this work, we propose a reward-free, active human-in-the-loop learning method called Human-Guided Distributional Soft Actor-Critic (H-DSAC). Our method combines Proxy Value Propagation (PVP) and Distributional Soft Actor-Critic (DSAC) to enable efficient and safe training in real-world environments. The key innovation is the construction of a distributed proxy value function within the DSAC framework. This function encodes human intent by assigning higher expected returns to expert demonstrations and penalizing actions that require human intervention. By extrapolating these labels to unlabeled states, the policy is effectively guided toward expert-like behavior. With a well-designed state space, our method achieves real-world driving policy learning within practical training times. Results from both simulation and real-world experiments demonstrate that our framework enables safe, robust, and sample-efficient learning for autonomous driving.
Related papers
- Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving [1.4063588986150455]
This paper develops a confidence-guided human-AI collaboration (C-HAC) strategy to overcome these limitations.<n>C-HAC achieves rapid and stable learning of human-guided policies with minimal human interaction.<n> Experiments across diverse driving scenarios reveal that C-HAC significantly outperforms conventional methods in terms of safety, efficiency, and overall performance.
arXiv Detail & Related papers (2025-06-04T04:31:10Z) - Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning [47.785786984974855]
We present a human-in-the-loop vision-based RL system that demonstrates impressive performance on a diverse set of dexterous manipulation tasks.<n>Our approach integrates demonstrations and human corrections, efficient RL algorithms, and other system-level design choices to learn policies.<n>We show that our method significantly outperforms imitation learning baselines and prior RL approaches, with an average 2x improvement in success rate and 1.8x faster execution.
arXiv Detail & Related papers (2024-10-29T08:12:20Z) - HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving [2.807187711407621]
We propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework.
We first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM)
In this paradigm, the human expert serves as a mentor to the AI agent, while the agent could be guided to minimize traffic flow disturbance.
arXiv Detail & Related papers (2024-01-06T08:30:14Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Tactile Active Inference Reinforcement Learning for Efficient Robotic
Manipulation Skill Acquisition [10.072992621244042]
We propose a novel method for skill learning in robotic manipulation called Tactile Active Inference Reinforcement Learning (Tactile-AIRL)
To enhance the performance of reinforcement learning (RL), we introduce active inference, which integrates model-based techniques and intrinsic curiosity into the RL process.
We demonstrate that our method achieves significantly high training efficiency in non-prehensile objects pushing tasks.
arXiv Detail & Related papers (2023-11-19T10:19:22Z) - Imitation Is Not Enough: Robustifying Imitation with Reinforcement
Learning for Challenging Driving Scenarios [147.16925581385576]
We show how imitation learning combined with reinforcement learning can substantially improve the safety and reliability of driving policies.
We train a policy on over 100k miles of urban driving data, and measure its effectiveness in test scenarios grouped by different levels of collision likelihood.
arXiv Detail & Related papers (2022-12-21T23:59:33Z) - Constrained Reinforcement Learning for Robotics via Scenario-Based
Programming [64.07167316957533]
It is crucial to optimize the performance of DRL-based agents while providing guarantees about their behavior.
This paper presents a novel technique for incorporating domain-expert knowledge into a constrained DRL training loop.
Our experiments demonstrate that using our approach to leverage expert knowledge dramatically improves the safety and the performance of the agent.
arXiv Detail & Related papers (2022-06-20T07:19:38Z) - Efficient Learning of Safe Driving Policy via Human-AI Copilot
Optimization [38.21629972247463]
We develop a novel human-in-the-loop learning method called Human-AI Copilot Optimization (HACO)
The proposed HACO effectively utilizes the data both from the trial-and-error exploration and human's partial demonstration to train a high-performing agent.
experiments show that HACO achieves a substantially high sample efficiency in the safe driving benchmark.
arXiv Detail & Related papers (2022-02-17T06:29:46Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.