Recursive Constraints to Prevent Instability in Constrained
Reinforcement Learning
- URL: http://arxiv.org/abs/2201.07958v1
- Date: Thu, 20 Jan 2022 02:33:24 GMT
- Title: Recursive Constraints to Prevent Instability in Constrained
Reinforcement Learning
- Authors: Jaeyoung Lee, Sean Sedwards and Krzysztof Czarnecki
- Abstract summary: We consider the challenge of finding a deterministic policy for a Markov decision process.
This class of problem is known to be hard, but the combined requirements of determinism and uniform optimality can create learning instability.
We present a suitable constrained reinforcement learning algorithm that prevents learning instability.
- Score: 16.019477271828745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the challenge of finding a deterministic policy for a Markov
decision process that uniformly (in all states) maximizes one reward subject to
a probabilistic constraint over a different reward. Existing solutions do not
fully address our precise problem definition, which nevertheless arises
naturally in the context of safety-critical robotic systems. This class of
problem is known to be hard, but the combined requirements of determinism and
uniform optimality can create learning instability. In this work, after
describing and motivating our problem with a simple example, we present a
suitable constrained reinforcement learning algorithm that prevents learning
instability, using recursive constraints. Our proposed approach admits an
approximative form that improves efficiency and is conservative w.r.t. the
constraint.
Related papers
- Learning Adversarial MDPs with Stochastic Hard Constraints [37.24692425018]
We study online learning problems in constrained decision processes with adversarial losses and hard constraints.
We design an algorithm that achieves sublinear regret while ensuring that the constraints are satisfied at every episode with high probability.
arXiv Detail & Related papers (2024-03-06T12:49:08Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task.
We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z) - On Bellman's principle of optimality and Reinforcement learning for
safety-constrained Markov decision process [0.0]
We study optimality for the safety-constrained Markov decision process which is the underlying framework for safe reinforcement learning.
We construct a modified $Q$-learning algorithm for learning the Lagrangian from data.
arXiv Detail & Related papers (2023-02-25T20:36:41Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - Assured RL: Reinforcement Learning with Almost Sure Constraints [0.0]
We consider the problem of finding optimal policies for a Markov Decision Process with almost sure constraints on state transitions and action triplets.
We define value and action-value functions that satisfy a barrier-based decomposition.
We develop a Barrier-learning algorithm, based on Q-Learning, that identifies such unsafe state-action pairs.
arXiv Detail & Related papers (2020-12-24T00:29:28Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.