Handling Long and Richly Constrained Tasks through Constrained
Hierarchical Reinforcement Learning
- URL: http://arxiv.org/abs/2302.10639v2
- Date: Tue, 9 Jan 2024 05:11:53 GMT
- Title: Handling Long and Richly Constrained Tasks through Constrained
Hierarchical Reinforcement Learning
- Authors: Yuxiao Lu, Arunesh Sinha and Pradeep Varakantham
- Abstract summary: Safety in goal directed Reinforcement Learning (RL) settings has typically been handled through constraints over trajectories.
We propose a (safety) Constrained Search with Hierarchical Reinforcement Learning (CoSHRL) mechanism that combines an upper level constrained search agent with a low-level goal conditioned RL agent.
A major advantage of CoSHRL is that it can handle constraints on the cost value distribution and can adjust to flexible constraint thresholds without retraining.
- Score: 20.280636126917614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safety in goal directed Reinforcement Learning (RL) settings has typically
been handled through constraints over trajectories and have demonstrated good
performance in primarily short horizon tasks. In this paper, we are
specifically interested in the problem of solving temporally extended decision
making problems such as robots cleaning different areas in a house while
avoiding slippery and unsafe areas (e.g., stairs) and retaining enough charge
to move to a charging dock; in the presence of complex safety constraints. Our
key contribution is a (safety) Constrained Search with Hierarchical
Reinforcement Learning (CoSHRL) mechanism that combines an upper level
constrained search agent (which computes a reward maximizing policy from a
given start to a far away goal state while satisfying cost constraints) with a
low-level goal conditioned RL agent (which estimates cost and reward values to
move between nearby states). A major advantage of CoSHRL is that it can handle
constraints on the cost value distribution (e.g., on Conditional Value at Risk,
CVaR) and can adjust to flexible constraint thresholds without retraining. We
perform extensive experiments with different types of safety constraints to
demonstrate the utility of our approach over leading approaches in constrained
and hierarchical RL.
Related papers
- Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning [11.666700714916065]
Constrained RL is a framework for enforcing safe actions in Reinforcement Learning.
Most recent approaches for solving Constrained RL convert the trajectory based cost constraint into a surrogate problem.
We present an approach that does not modify the trajectory based cost constraint and instead imitates good'' trajectories.
arXiv Detail & Related papers (2023-12-16T08:48:46Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Trust Region-Based Safe Distributional Reinforcement Learning for
Multiple Constraints [18.064813206191754]
We propose a trust region-based safe reinforcement learning algorithm for multiple constraints called a safe distributional actor-critic (SDAC)
Our main contributions are as follows: 1) introducing a gradient integration method to manage infeasibility issues in multi-constrained problems, ensuring theoretical convergence, and 2) developing a TD($lambda$) target distribution to estimate risk-averse constraints with low biases.
arXiv Detail & Related papers (2023-01-26T04:05:40Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Enhancing Safe Exploration Using Safety State Augmentation [71.00929878212382]
We tackle the problem of safe exploration in model-free reinforcement learning.
We derive policies for scheduling the safety budget during training.
We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
arXiv Detail & Related papers (2022-06-06T15:23:07Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - Towards Safe Reinforcement Learning with a Safety Editor Policy [29.811723497181486]
We consider the safe reinforcement learning problem of maximizing utility while satisfying constraints.
We learn a safety editor policy that transforms potentially unsafe actions output by a utility maximizer policy into safe ones.
Our approach demonstrates outstanding utility performance while complying with the constraints.
arXiv Detail & Related papers (2022-01-28T21:32:59Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.