Offline Goal-Conditioned Reinforcement Learning for Safety-Critical
Tasks with Recovery Policy
- URL: http://arxiv.org/abs/2403.01734v1
- Date: Mon, 4 Mar 2024 05:20:57 GMT
- Title: Offline Goal-Conditioned Reinforcement Learning for Safety-Critical
Tasks with Recovery Policy
- Authors: Chenyang Cao, Zichen Yan, Renhao Lu, Junbo Tan, Xueqian Wang
- Abstract summary: offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset.
We propose a new method called Recovery-based Supervised Learning (RbSL) to accomplish safety-critical tasks with various goals.
- Score: 4.854443247023496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline goal-conditioned reinforcement learning (GCRL) aims at solving
goal-reaching tasks with sparse rewards from an offline dataset. While prior
work has demonstrated various approaches for agents to learn near-optimal
policies, these methods encounter limitations when dealing with diverse
constraints in complex environments, such as safety constraints. Some of these
approaches prioritize goal attainment without considering safety, while others
excessively focus on safety at the expense of training efficiency. In this
paper, we study the problem of constrained offline GCRL and propose a new
method called Recovery-based Supervised Learning (RbSL) to accomplish
safety-critical tasks with various goals. To evaluate the method performance,
we build a benchmark based on the robot-fetching environment with a randomly
positioned obstacle and use expert or random policies to generate an offline
dataset. We compare RbSL with three offline GCRL algorithms and one offline
safe RL algorithm. As a result, our method outperforms the existing
state-of-the-art methods to a large extent. Furthermore, we validate the
practicality and effectiveness of RbSL by deploying it on a real Panda
manipulator. Code is available at https://github.com/Sunlighted/RbSL.git.
Related papers
- FOSP: Fine-tuning Offline Safe Policy through World Models [3.7971075341023526]
Model-based Reinforcement Learning (RL) has shown its high training efficiency and capability of handling high-dimensional tasks.
However, prior works still pose safety challenges due to the online exploration in real-world deployment.
In this paper, we aim to further enhance safety during the deployment stage for vision-based robotic tasks by fine-tuning an offline-trained policy.
arXiv Detail & Related papers (2024-07-06T03:22:57Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Constrained Decision Transformer for Offline Safe Reinforcement Learning [16.485325576173427]
We study the offline safe RL problem from a novel multi-objective optimization perspective.
We propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment.
arXiv Detail & Related papers (2023-02-14T21:27:10Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning [15.841609263723575]
We study the problem of safe offline reinforcement learning (RL)
The goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment.
We show that na"ive approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions.
arXiv Detail & Related papers (2021-07-19T16:30:14Z) - Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety
Constraints in Finite MDPs [71.47895794305883]
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning setting.
We present an SPI for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals.
arXiv Detail & Related papers (2021-05-31T21:04:21Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.