Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning
- URL: http://arxiv.org/abs/2310.03718v2
- Date: Mon, 29 Apr 2024 20:06:18 GMT
- Title: Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning
- Authors: Yihang Yao, Zuxin Liu, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, Ding Zhao,
- Abstract summary: We introduce the Conditioned Constrained Policy Optimization (CCPO) framework, consisting of two key modules.
Our experiments demonstrate that CCPO outperforms the baselines in terms of safety and task performance.
This makes our approach suitable for real-world dynamic applications.
- Score: 33.988698754176646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safe reinforcement learning (RL) focuses on training reward-maximizing agents subject to pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying safety constraint requirements during deployment without retraining remains a largely unexplored and challenging area. In this work, we formulate the versatile safe RL problem and consider two primary requirements: training efficiency and zero-shot adaptation capability. To address them, we introduce the Conditioned Constrained Policy Optimization (CCPO) framework, consisting of two key modules: (1) Versatile Value Estimation (VVE) for approximating value functions under unseen threshold conditions, and (2) Conditioned Variational Inference (CVI) for encoding arbitrary constraint thresholds during policy optimization. Our extensive experiments demonstrate that CCPO outperforms the baselines in terms of safety and task performance while preserving zero-shot adaptation capabilities to different constraint thresholds data-efficiently. This makes our approach suitable for real-world dynamic applications.
Related papers
- One-Shot Safety Alignment for Large Language Models via Optimal Dualization [64.52223677468861]
This paper presents a dualization perspective that reduces constrained alignment to an equivalent unconstrained alignment problem.
We do so by pre-optimizing a smooth and convex dual function that has a closed form.
Our strategy leads to two practical algorithms in model-based and preference-based scenarios.
arXiv Detail & Related papers (2024-05-29T22:12:52Z) - Concurrent Learning of Policy and Unknown Safety Constraints in Reinforcement Learning [4.14360329494344]
Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades.
Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety.
Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process.
We propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment.
arXiv Detail & Related papers (2024-02-24T20:01:15Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Iterative Reachability Estimation for Safe Reinforcement Learning [23.942701020636882]
We propose a new framework, Reachability Estimation for Safe Policy Optimization (RESPO), for safety-constrained reinforcement learning (RL) environments.
In the feasible set where there exist violation-free policies, we optimize for rewards while maintaining persistent safety.
We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, PyBullet, and MuJoCo.
arXiv Detail & Related papers (2023-09-24T02:36:42Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - SaFormer: A Conditional Sequence Modeling Approach to Offline Safe
Reinforcement Learning [64.33956692265419]
offline safe RL is of great practical relevance for deploying agents in real-world applications.
We present a novel offline safe RL approach referred to as SaFormer.
arXiv Detail & Related papers (2023-01-28T13:57:01Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Safety-Constrained Policy Transfer with Successor Features [19.754549649781644]
We propose a Constrained Markov Decision Process (CMDP) formulation that enables the transfer of policies and adherence to safety constraints.
Our approach relies on a novel extension of generalized policy improvement to constrained settings via a Lagrangian formulation.
Our experiments in simulated domains show that our approach is effective; it visits unsafe states less frequently and outperforms alternative state-of-the-art methods when taking safety constraints into account.
arXiv Detail & Related papers (2022-11-10T06:06:36Z) - Constrained Variational Policy Optimization for Safe Reinforcement
Learning [40.38842532850959]
Safe reinforcement learning aims to learn policies that satisfy certain constraints before deploying to safety-critical applications.
primal-dual as a prevalent constrained optimization framework suffers from instability issues and lacks optimality guarantees.
This paper overcomes the issues from a novel probabilistic inference perspective and proposes an Expectation-Maximization style approach to learn safe policy.
arXiv Detail & Related papers (2022-01-28T04:24:09Z) - Deep Constrained Q-learning [15.582910645906145]
In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a set of constraints.
We propose Constrained Q-learning, a novel off-policy reinforcement learning framework restricting the action space directly in the Q-update to learn the optimal Q-function for the induced constrained MDP and the corresponding safe policy.
arXiv Detail & Related papers (2020-03-20T17:26:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.