Related papers: Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios

Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios

URL: http://arxiv.org/abs/2505.13532v1
Date: Sun, 18 May 2025 11:35:57 GMT
Title: Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios
Authors: Feihong Zhang, Guojian Zhan, Bin Shuai, Tianyi Zhang, Jingliang Duan, Shengbo Eben Li,
Abstract summary: We propose a new safety-oriented training technique called harmonic policy iteration (HPI)<n>At each RL iteration, it first calculates two policy gradients associated with efficient driving and safety constraints, respectively.<n>A harmonic gradient is derived for policy updating, minimizing conflicts between the two gradients.<n>We adopt the state-of-the-art DSAC algorithm as the backbone and integrate it with our HPI to develop a new safe RL algorithm, DSAC-H.
Score: 16.23857092084669
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL), known for its self-evolution capability, offers a promising approach to training high-level autonomous driving systems. However, handling constraints remains a significant challenge for existing RL algorithms, particularly in real-world applications. In this paper, we propose a new safety-oriented training technique called harmonic policy iteration (HPI). At each RL iteration, it first calculates two policy gradients associated with efficient driving and safety constraints, respectively. Then, a harmonic gradient is derived for policy updating, minimizing conflicts between the two gradients and consequently enabling a more balanced and stable training process. Furthermore, we adopt the state-of-the-art DSAC algorithm as the backbone and integrate it with our HPI to develop a new safe RL algorithm, DSAC-H. Extensive simulations in multi-lane scenarios demonstrate that DSAC-H achieves efficient driving performance with near-zero safety constraint violations.

Related papers

CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving [45.05135725542318]
IMitation and Reinforcement Learning (CIMRL) approach enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. By combining RL and imitation, we demonstrate our method achieves state-of-the-art results in closed loop simulation and real world driving benchmarks.
arXiv Detail & Related papers (2024-06-13T07:31:29Z)
One-Shot Safety Alignment for Large Language Models via Optimal Dualization [64.52223677468861]
This paper presents a perspective of dualization that reduces constrained alignment to an equivalent unconstrained alignment problem. We do so by pre-optimizing a smooth and convex dual function that has a closed form. Our strategy leads to two practical algorithms in model-based and preference-based settings.
arXiv Detail & Related papers (2024-05-29T22:12:52Z)
Long and Short-Term Constraints Driven Safe Reinforcement Learning for Autonomous Driving [11.072917563013428]
Reinforcement learning (RL) has been widely used in decision-making and control tasks, but the risk is very high for the agent in the training process. In this paper, we propose a novel algorithm based on the long and short-term constraints (LSTC) for safe RL. The proposed method achieves higher safety in continuous state and action tasks, and exhibits higher exploration performance in long-distance decision-making tasks.
arXiv Detail & Related papers (2024-03-27T02:41:52Z)
Towards Safe Load Balancing based on Control Barrier Functions and Deep Reinforcement Learning [0.691367883100748]
We propose a safe learning-based load balancing algorithm for Software Defined-Wide Area Network (SD-WAN) It is empowered by Deep Reinforcement Learning (DRL) combined with a Control Barrier Function (CBF) We show that our approach delivers near-optimal Quality-of-Service (QoS) in terms of end-to-end delay while respecting safety requirements related to link capacity constraints.
arXiv Detail & Related papers (2024-01-10T19:43:12Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL. We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection. To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z)
Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC) Our algorithm alleviates problems with local minima through a smooth critic function. We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z)
Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles [13.40143623056186]
This paper proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment.
arXiv Detail & Related papers (2021-12-18T10:45:31Z)
Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations. We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization [20.913475536020247]
This paper presents a safeReinforcement learning algorithm, called Parallel Constrained Policy Optimization (PCPO), for two autonomous driving tasks. PCPO extends today's common actor-critic architecture to a three-component learning framework, in which three neural networks are used to approximate the policy function, value function and a newly added risk function. To ensure the feasibility of safety constrained problems, synchronized parallel learners are employed to explore different state spaces, which accelerates learning and policy-update.
arXiv Detail & Related papers (2020-03-03T02:53:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.