Related papers: Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

URL: http://arxiv.org/abs/2111.12953v1
Date: Thu, 25 Nov 2021 07:24:30 GMT
Title: Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning
Authors: Haitong Ma, Changliu Liu, Shengbo Eben Li, Sifa Zheng, Wenchao Sun, Jianyu Chen
Abstract summary: We propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions. The safety index is designed to increase rapidly for potentially dangerous actions. We claim that we can learn the energy function in a model-free manner similar to learning a value function.
Score: 7.138691584246846
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the trial-and-error mechanism of reinforcement learning (RL), a notorious contradiction arises when we expect to learn a safe policy: how to learn a safe policy without enough data and prior model about the dangerous region? Existing methods mostly use the posterior penalty for dangerous actions, which means that the agent is not penalized until experiencing danger. This fact causes that the agent cannot learn a zero-violation policy even after convergence. Otherwise, it would not receive any penalty and lose the knowledge about danger. In this paper, we propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions, or the safety indexes. The safety index is designed to increase rapidly for potentially dangerous actions, which allows us to locate the safe set on the action space, or the control safe set. Therefore, we can identify the dangerous actions prior to taking them, and further obtain a zero constraint-violation policy after convergence.We claim that we can learn the energy function in a model-free manner similar to learning a value function. By using the energy function transition as the constraint objective, we formulate a constrained RL problem. We prove that our Lagrangian-based solutions make sure that the learned policy will converge to the constrained optimum under some assumptions. The proposed algorithm is evaluated on both the complex simulation environments and a hardware-in-loop (HIL) experiment with a real controller from the autonomous vehicle. Experimental results suggest that the converged policy in all environments achieves zero constraint violation and comparable performance with model-based baselines.

Related papers

Do No Harm: A Counterfactual Approach to Safe Reinforcement Learning [5.862025534776996]
Reinforcement Learning for control has become increasingly popular due to its ability to learn rich feedback policies that take into account uncertainty and complex representations of the environment. In such methods, if agents are in, or must visit, states where constraint violation might be inevitable, it is unclear how much they should be penalized. We address this challenge by formulating a constraint on the counterfactual harm of the learned policy compared to a default, safe policy. In a philosophical sense this formulation only penalizes the learner for constraint violations that it caused; in a practical sense it maintains feasibility of the optimal control problem.
arXiv Detail & Related papers (2024-05-19T20:33:21Z)
Policy Bifurcation in Safe Reinforcement Learning [35.75059015441807]
In some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous local optima can inevitably lead to constraint violations. We are the first to identify the generating mechanism of such a phenomenon, and employ topological analysis to rigorously prove the existence of bifurcation in safe RL. We propose a safe RL algorithm called multimodal policy optimization (MUPO), which utilizes a Gaussian mixture distribution as the policy output.
arXiv Detail & Related papers (2024-03-19T15:54:38Z)
A Multiplicative Value Function for Safe and Efficient Reinforcement Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z)
Safe Deep Reinforcement Learning by Verifying Task-Level Properties [84.64203221849648]
Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL) The cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space. In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric.
arXiv Detail & Related papers (2023-02-20T15:24:06Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Enhancing Safe Exploration Using Safety State Augmentation [71.00929878212382]
We tackle the problem of safe exploration in model-free reinforcement learning. We derive policies for scheduling the safety budget during training. We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
arXiv Detail & Related papers (2022-06-06T15:23:07Z)
SAUTE RL: Almost Surely Safe Reinforcement Learning Using State Augmentation [63.25418599322092]
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications. We address the problem by introducing Safety Augmented Markov Decision Processes (MDPs) We show that Saute MDP allows to view Safe augmentation problem from a different perspective enabling new features.
arXiv Detail & Related papers (2022-02-14T08:57:01Z)
Safe Exploration for Constrained Reinforcement Learning with Provable Guarantees [2.379828460137829]
We propose a model-based safe RL algorithm that we call the Optimistic-Pessimistic Safe Reinforcement Learning (OPSRL) algorithm. We show that it achieves an $tildemathcalO(S2sqrtA H7K/ (barC - barC_b)$ cumulative regret without violating the safety constraints during learning.
arXiv Detail & Related papers (2021-12-01T23:21:48Z)
Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian [5.686699342802045]
We propose a separated proportional-integral Lagrangian algorithm to enhance RL safety under uncertainty. We demonstrate our method can reduce the oscillations and conservatism of RL policy in a car-following simulation.
arXiv Detail & Related papers (2021-08-26T07:34:14Z)
Safe Reinforcement Learning Using Advantage-Based Intervention [45.79740561754542]
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training. Our method comes with strong guarantees on safety during both training and deployment.
arXiv Detail & Related papers (2021-06-16T20:28:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.