Related papers: Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving

Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving

URL: http://arxiv.org/abs/2506.03568v2
Date: Thu, 05 Jun 2025 02:35:36 GMT
Title: Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving
Authors: Li Zeqiao, Wang Yijing, Wang Haoyu, Li Zheng, Li Peng, Zuo zhiqiang, Hu Chuan,
Abstract summary: This paper develops a confidence-guided human-AI collaboration (C-HAC) strategy to overcome these limitations.<n>C-HAC achieves rapid and stable learning of human-guided policies with minimal human interaction.<n> Experiments across diverse driving scenarios reveal that C-HAC significantly outperforms conventional methods in terms of safety, efficiency, and overall performance.
Score: 1.4063588986150455
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous driving promises significant advancements in mobility, road safety and traffic efficiency, yet reinforcement learning and imitation learning face safe-exploration and distribution-shift challenges. Although human-AI collaboration alleviates these issues, it often relies heavily on extensive human intervention, which increases costs and reduces efficiency. This paper develops a confidence-guided human-AI collaboration (C-HAC) strategy to overcome these limitations. First, C-HAC employs a distributional proxy value propagation method within the distributional soft actor-critic (DSAC) framework. By leveraging return distributions to represent human intentions C-HAC achieves rapid and stable learning of human-guided policies with minimal human interaction. Subsequently, a shared control mechanism is activated to integrate the learned human-guided policy with a self-learning policy that maximizes cumulative rewards. This enables the agent to explore independently and continuously enhance its performance beyond human guidance. Finally, a policy confidence evaluation algorithm capitalizes on DSAC's return distribution networks to facilitate dynamic switching between human-guided and self-learning policies via a confidence-based intervention function. This ensures the agent can pursue optimal policies while maintaining safety and performance guarantees. Extensive experiments across diverse driving scenarios reveal that C-HAC significantly outperforms conventional methods in terms of safety, efficiency, and overall performance, achieving state-of-the-art results. The effectiveness of the proposed method is further validated through real-world road tests in complex traffic conditions. The videos and code are available at: https://github.com/lzqw/C-HAC.

Related papers

Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow [8.452082837356162]
This study proposes a trust-based multi-agent reinforcement learning (Trust-MARL) framework to address the challenge of cooperative on-ramp merging.<n>Trust-MARL enhances global traffic efficiency by leveraging inter-agent trust to improve bottleneck throughput and mitigate traffic shockwave.<n>An extensive set of ablation studies and comparative experiments validates the effectiveness of the proposed Trust-MARL approach.
arXiv Detail & Related papers (2025-06-14T18:35:10Z)
Interactive Double Deep Q-network: Integrating Human Interventions and Evaluative Predictions in Reinforcement Learning of Autonomous Driving [16.379623042562415]
This study introduces Interactive Double Deep Q-network (iDDQN), a Human-in-the-Loop (HITL) approach that enhances Reinforcement Learning (RL)<n>Our proposed iDDQN method modifies the Q-value update equation to integrate human and agent actions, establishing a collaborative approach for policy development.<n> Empirical results in simulated autonomous driving scenarios demonstrate that iDDQN outperforms established approaches.
arXiv Detail & Related papers (2025-04-28T05:25:18Z)
A Differentiated Reward Method for Reinforcement Learning based Multi-Vehicle Cooperative Decision-Making Algorithms [11.569616198957887]
Reinforcement learning (RL) shows great potential for optimizing multi-vehicle cooperative driving strategies through the state-action-reward feedback loop.<n>This paper proposes a differentiated reward method based on steady-state transition systems, which incorporates state transition gradient information into the reward design.
arXiv Detail & Related papers (2025-02-01T07:16:15Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving [1.5361702135159845]
Reinforcement Learning with Human Feedback (RLHF) has attracted substantial attention due to its potential to enhance training safety and sampling efficiency. Inspired by the human learning process, we propose Physics-enhanced Reinforcement Learning with Human Feedback (PE-RLHF) PE-RLHF guarantees the learned policy will perform at least as well as the given physics-based policy, even when human feedback quality deteriorates.
arXiv Detail & Related papers (2024-09-01T22:20:32Z)
MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention [81.56607128684723]
We introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention. MereQ infers a residual reward function that captures the discrepancy between the human expert's and the prior policy's underlying reward functions. It then employs Residual Q-Learning (RQL) to align the policy with human preferences using this residual reward function.
arXiv Detail & Related papers (2024-06-24T01:51:09Z)
Adversarial Driving Behavior Generation Incorporating Human Risk Cognition for Autonomous Vehicle Evaluation [23.476885023669524]
This paper focuses on the development of a novel framework for generating adversarial driving behavior of background vehicle. The adversarial behavior is learned by a reinforcement learning (RL) approach incorporated with the cumulative prospect theory (CPT) A comparative case study regarding the cut-in scenario is conducted on a high fidelity Hardware-in-the-Loop (HiL) platform.
arXiv Detail & Related papers (2023-09-29T04:09:46Z)
Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk. We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations. We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z)
Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation [72.24964965882783]
Reinforcement learning (RL) is a promising approach for robotic navigation, allowing robots to learn through trial and error.<n>Real-world robotic tasks often suffer from sparse rewards, leading to inefficient exploration and suboptimal policies.<n>We introduce Confidence-Controlled Exploration (CCE), a novel method that improves sample efficiency in RL-based robotic navigation without modifying the reward function.
arXiv Detail & Related papers (2023-06-09T18:45:15Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.