Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion
Model
- URL: http://arxiv.org/abs/2401.10700v1
- Date: Fri, 19 Jan 2024 14:05:09 GMT
- Title: Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion
Model
- Authors: Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li,
Xianyuan Zhan, Jingjing Liu
- Abstract summary: We propose FISOR (FeasIbility-guided Safe Offline RL), which allows safety constraint adherence, reward, and offline policy learning.
In FISOR, the translated optimal policy for the translated optimization problem can be derived in a special form of weighted behavior cloning.
We show that FISOR is the only method that can guarantee safety satisfaction in all tasks, while achieving top returns in most tasks.
- Score: 23.93820548551533
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Safe offline RL is a promising way to bypass risky online interactions
towards safe policy learning. Most existing methods only enforce soft
constraints, i.e., constraining safety violations in expectation below
thresholds predetermined. This can lead to potentially unsafe outcomes, thus
unacceptable in safety-critical scenarios. An alternative is to enforce the
hard constraint of zero violation. However, this can be challenging in offline
setting, as it needs to strike the right balance among three highly intricate
and correlated aspects: safety constraint satisfaction, reward maximization,
and behavior regularization imposed by offline datasets. Interestingly, we
discover that via reachability analysis of safe-control theory, the hard safety
constraint can be equivalently translated to identifying the largest feasible
region given the offline dataset. This seamlessly converts the original trilogy
problem to a feasibility-dependent objective, i.e., maximizing reward value
within the feasible region while minimizing safety risks in the infeasible
region. Inspired by these, we propose FISOR (FeasIbility-guided Safe Offline
RL), which allows safety constraint adherence, reward maximization, and offline
policy learning to be realized via three decoupled processes, while offering
strong safety performance and stability. In FISOR, the optimal policy for the
translated optimization problem can be derived in a special form of weighted
behavior cloning. Thus, we propose a novel energy-guided diffusion model that
does not require training a complicated time-dependent classifier to extract
the policy, greatly simplifying the training. We compare FISOR against
baselines on DSRL benchmark for safe offline RL. Evaluation results show that
FISOR is the only method that can guarantee safety satisfaction in all tasks,
while achieving top returns in most tasks.
Related papers
- Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Concurrent Learning of Policy and Unknown Safety Constraints in Reinforcement Learning [4.14360329494344]
Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades.
Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety.
Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process.
We propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment.
arXiv Detail & Related papers (2024-02-24T20:01:15Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Iterative Reachability Estimation for Safe Reinforcement Learning [23.942701020636882]
We propose a new framework, Reachability Estimation for Safe Policy Optimization (RESPO), for safety-constrained reinforcement learning (RL) environments.
In the feasible set where there exist violation-free policies, we optimize for rewards while maintaining persistent safety.
We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, PyBullet, and MuJoCo.
arXiv Detail & Related papers (2023-09-24T02:36:42Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Constrained Decision Transformer for Offline Safe Reinforcement Learning [16.485325576173427]
We study the offline safe RL problem from a novel multi-objective optimization perspective.
We propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment.
arXiv Detail & Related papers (2023-02-14T21:27:10Z) - SaFormer: A Conditional Sequence Modeling Approach to Offline Safe
Reinforcement Learning [64.33956692265419]
offline safe RL is of great practical relevance for deploying agents in real-world applications.
We present a novel offline safe RL approach referred to as SaFormer.
arXiv Detail & Related papers (2023-01-28T13:57:01Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning [15.841609263723575]
We study the problem of safe offline reinforcement learning (RL)
The goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment.
We show that na"ive approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions.
arXiv Detail & Related papers (2021-07-19T16:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.