Solving Richly Constrained Reinforcement Learning through State
Augmentation and Reward Penalties
- URL: http://arxiv.org/abs/2301.11592v2
- Date: Wed, 31 May 2023 05:49:59 GMT
- Title: Solving Richly Constrained Reinforcement Learning through State
Augmentation and Reward Penalties
- Authors: Hao Jiang and Tien Mai and Pradeep Varakantham and Minh Huy Hoang
- Abstract summary: Key challenge is handling expected cost accumulated using the policy.
Existing methods have developed innovative ways of converting this cost constraint over entire policy to constraints over local decisions.
We provide an equivalent unconstrained formulation to constrained RL that has an augmented state space and reward penalties.
- Score: 8.86470998648085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Constrained Reinforcement Learning has been employed to enforce safety
constraints on policy through the use of expected cost constraints. The key
challenge is in handling expected cost accumulated using the policy and not
just in a single step. Existing methods have developed innovative ways of
converting this cost constraint over entire policy to constraints over local
decisions (at each time step). While such approaches have provided good
solutions with regards to objective, they can either be overly aggressive or
conservative with respect to costs. This is owing to use of estimates for
"future" or "backward" costs in local cost constraints.
To that end, we provide an equivalent unconstrained formulation to
constrained RL that has an augmented state space and reward penalties. This
intuitive formulation is general and has interesting theoretical properties.
More importantly, this provides a new paradigm for solving constrained RL
problems effectively. As we show in our experimental results, we are able to
outperform leading approaches on multiple benchmark problems from literature.
Related papers
- Off-Policy Primal-Dual Safe Reinforcement Learning [16.918188277722503]
We show that the error in cumulative cost estimation causes significant underestimation of cost when using off-policy methods.
We propose conservative policy optimization, which learns a policy in a constraint-satisfying area by considering the uncertainty in estimation.
We then introduce local policy convexification to help eliminate such suboptimality by gradually reducing the estimation uncertainty.
arXiv Detail & Related papers (2024-01-26T10:33:38Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning [11.666700714916065]
Constrained RL is a framework for enforcing safe actions in Reinforcement Learning.
Most recent approaches for solving Constrained RL convert the trajectory based cost constraint into a surrogate problem.
We present an approach that does not modify the trajectory based cost constraint and instead imitates good'' trajectories.
arXiv Detail & Related papers (2023-12-16T08:48:46Z) - Handling Long and Richly Constrained Tasks through Constrained
Hierarchical Reinforcement Learning [20.280636126917614]
Safety in goal directed Reinforcement Learning (RL) settings has typically been handled through constraints over trajectories.
We propose a (safety) Constrained Search with Hierarchical Reinforcement Learning (CoSHRL) mechanism that combines an upper level constrained search agent with a low-level goal conditioned RL agent.
A major advantage of CoSHRL is that it can handle constraints on the cost value distribution and can adjust to flexible constraint thresholds without retraining.
arXiv Detail & Related papers (2023-02-21T12:57:12Z) - SaFormer: A Conditional Sequence Modeling Approach to Offline Safe
Reinforcement Learning [64.33956692265419]
offline safe RL is of great practical relevance for deploying agents in real-world applications.
We present a novel offline safe RL approach referred to as SaFormer.
arXiv Detail & Related papers (2023-01-28T13:57:01Z) - AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement
Learning [3.4806267677524896]
We propose AutoCost, a framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance.
We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with baseline agents that use the same policy learners but with only extrinsic costs.
arXiv Detail & Related papers (2023-01-24T22:51:29Z) - Enhancing Safe Exploration Using Safety State Augmentation [71.00929878212382]
We tackle the problem of safe exploration in model-free reinforcement learning.
We derive policies for scheduling the safety budget during training.
We show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
arXiv Detail & Related papers (2022-06-06T15:23:07Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - COptiDICE: Offline Constrained Reinforcement Learning via Stationary
Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.
We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution.
Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.