Related papers: SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning

SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning

URL: http://arxiv.org/abs/2301.12203v1
Date: Sat, 28 Jan 2023 13:57:01 GMT
Title: SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning
Authors: Qin Zhang and Linrui Zhang and Haoran Xu and Li Shen and Bowen Wang and Yongzhe Chang and Xueqian Wang and Bo Yuan and Dacheng Tao
Abstract summary: offline safe RL is of great practical relevance for deploying agents in real-world applications. We present a novel offline safe RL approach referred to as SaFormer.
Score: 64.33956692265419
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Offline safe RL is of great practical relevance for deploying agents in real-world applications. However, acquiring constraint-satisfying policies from the fixed dataset is non-trivial for conventional approaches. Even worse, the learned constraints are stationary and may become invalid when the online safety requirement changes. In this paper, we present a novel offline safe RL approach referred to as SaFormer, which tackles the above issues via conditional sequence modeling. In contrast to existing sequence models, we propose cost-related tokens to restrict the action space and a posterior safety verification to enforce the constraint explicitly. Specifically, SaFormer performs a two-stage auto-regression conditioned by the maximum remaining cost to generate feasible candidates. It then filters out unsafe attempts and executes the optimal action with the highest expected return. Extensive experiments demonstrate the efficacy of SaFormer featuring (1) competitive returns with tightened constraint satisfaction; (2) adaptability to the in-range cost values of the offline data without retraining; (3) generalizability for constraints beyond the current dataset.

Related papers

Reward-Safety Balance in Offline Safe RL via Diffusion Regularization [16.5825143820431]
Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL) DRCORL first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference.
arXiv Detail & Related papers (2025-02-18T00:00:03Z)
Learning General Continuous Constraint from Demonstrations via Positive-Unlabeled Learning [8.361428709513476]
This paper presents a positive-unlabeled (PU) learning approach to infer a continuous, arbitrary and possibly nonlinear, constraint from demonstration. The effectiveness of the proposed method is validated in two Mujoco environments.
arXiv Detail & Related papers (2024-07-23T14:00:18Z)
OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning [30.540598779743455]
Offline safe reinforcement learning (RL) aims to train a policy that satisfies constraints using a pre-collected dataset. This paper introduces a new paradigm in offline safe RL designed to overcome these critical limitations. Our approach makes compliance with safety constraints through effective data utilization and regularization techniques.
arXiv Detail & Related papers (2024-07-19T20:15:00Z)
Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states. In safety-critical domains, such behaviors could lead to disastrous outcomes. We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z)
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model [23.93820548551533]
We propose FISOR (FeasIbility-guided Safe Offline RL), which allows safety constraint adherence, reward, and offline policy learning. In FISOR, the translated optimal policy for the translated optimization problem can be derived in a special form of weighted behavior cloning. We show that FISOR is the only method that can guarantee safety satisfaction in all tasks, while achieving top returns in most tasks.
arXiv Detail & Related papers (2024-01-19T14:05:09Z)
Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning [33.988698754176646]
We introduce the Conditioned Constrained Policy Optimization (CCPO) framework, consisting of two key modules. Our experiments demonstrate that CCPO outperforms the baselines in terms of safety and task performance. This makes our approach suitable for real-world dynamic applications.
arXiv Detail & Related papers (2023-10-05T17:39:02Z)
Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states. The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z)
Constrained Decision Transformer for Offline Safe Reinforcement Learning [16.485325576173427]
We study the offline safe RL problem from a novel multi-objective optimization perspective. We propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment.
arXiv Detail & Related papers (2023-02-14T21:27:10Z)
Enhancing Safe Exploration Using Safety State Augmentation [71.00929878212382]
We tackle the problem of safe exploration in model-free reinforcement learning. We derive policies for scheduling the safety budget during training. We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
arXiv Detail & Related papers (2022-06-06T15:23:07Z)
COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset. We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution. Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z)
Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.