Safe Reinforcement Learning for Autonomous Vehicles through Parallel
Constrained Policy Optimization
- URL: http://arxiv.org/abs/2003.01303v1
- Date: Tue, 3 Mar 2020 02:53:30 GMT
- Title: Safe Reinforcement Learning for Autonomous Vehicles through Parallel
Constrained Policy Optimization
- Authors: Lu Wen, Jingliang Duan, Shengbo Eben Li, Shaobing Xu, Huei Peng
- Abstract summary: This paper presents a safeReinforcement learning algorithm, called Parallel Constrained Policy Optimization (PCPO), for two autonomous driving tasks.
PCPO extends today's common actor-critic architecture to a three-component learning framework, in which three neural networks are used to approximate the policy function, value function and a newly added risk function.
To ensure the feasibility of safety constrained problems, synchronized parallel learners are employed to explore different state spaces, which accelerates learning and policy-update.
- Score: 20.913475536020247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) is attracting increasing interests in autonomous
driving due to its potential to solve complex classification and control
problems. However, existing RL algorithms are rarely applied to real vehicles
for two predominant problems: behaviours are unexplainable, and they cannot
guarantee safety under new scenarios. This paper presents a safe RL algorithm,
called Parallel Constrained Policy Optimization (PCPO), for two autonomous
driving tasks. PCPO extends today's common actor-critic architecture to a
three-component learning framework, in which three neural networks are used to
approximate the policy function, value function and a newly added risk
function, respectively. Meanwhile, a trust region constraint is added to allow
large update steps without breaking the monotonic improvement condition. To
ensure the feasibility of safety constrained problems, synchronized parallel
learners are employed to explore different state spaces, which accelerates
learning and policy-update. The simulations of two scenarios for autonomous
vehicles confirm we can ensure safety while achieving fast learning.
Related papers
- RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes [57.319845580050924]
We propose a reinforcement learning framework that combines risk-sensitive control with an adaptive action space curriculum.
We show that our algorithm is capable of learning high-speed policies for a real-world off-road driving task.
arXiv Detail & Related papers (2024-05-07T23:32:36Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - Evaluation of Safety Constraints in Autonomous Navigation with Deep
Reinforcement Learning [62.997667081978825]
We compare two learnable navigation policies: safe and unsafe.
The safe policy takes the constraints into the account, while the other does not.
We show that the safe policy is able to generate trajectories with more clearance (distance to the obstacles) and makes less collisions while training without sacrificing the overall performance.
arXiv Detail & Related papers (2023-07-27T01:04:57Z) - Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning [48.667697255912614]
Mean-field reinforcement learning addresses the policy of a representative agent interacting with the infinite population of identical agents.
We propose Safe-M$3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions.
Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
arXiv Detail & Related papers (2023-06-29T15:57:07Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Model-Based Safe Reinforcement Learning with Time-Varying State and
Control Constraints: An Application to Intelligent Vehicles [13.40143623056186]
This paper proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints.
A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely.
The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment.
arXiv Detail & Related papers (2021-12-18T10:45:31Z) - Minimizing Safety Interference for Safe and Comfortable Automated
Driving with Distributional Reinforcement Learning [3.923354711049903]
We propose a distributional reinforcement learning framework to learn adaptive policies that can tune their level of conservativity at run-time based on the desired comfort and utility.
We show that our algorithm learns policies that can still drive reliable when the perception noise is two times higher than the training configuration for automated merging and crossing at occluded intersections.
arXiv Detail & Related papers (2021-07-15T13:36:55Z) - Model-based Safe Reinforcement Learning using Generalized Control
Barrier Function [6.556257209888797]
This paper proposes a model-based feasibility enhancement technique of constrained RL.
By using the model information, the policy can be optimized safely without violating actual safety constraints.
The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.
arXiv Detail & Related papers (2021-03-02T08:17:38Z) - Reinforcement Learning based Control of Imitative Policies for
Near-Accident Driving [41.54021613421446]
In near-accident scenarios, even a minor change in the vehicle's actions may result in drastically different consequences.
We propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes.
arXiv Detail & Related papers (2020-07-01T01:41:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.