Related papers: Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies

Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies

URL: http://arxiv.org/abs/2006.11645v3
Date: Sat, 10 Jul 2021 02:55:37 GMT
Title: Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies
Authors: Tsung-Yen Yang and Justinian Rosca and Karthik Narasimhan and Peter J. Ramadge
Abstract summary: We consider the problem of reinforcement learning when provided with a baseline control policy and a set of constraints that the learner must satisfy. We propose an iterative policy optimization algorithm that alternates between maximizing expected return on the task, minimizing distance to the baseline policy, and projecting the policy onto the constraint-satisfying set. Our algorithm consistently outperforms several state-of-the-art baselines, achieving 10 times fewer constraint violations and 40% higher reward on average.
Score: 34.555500347840805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the learner must satisfy. The baseline policy can arise from demonstration data or a teacher agent and may provide useful cues for learning, but it might also be sub-optimal for the task at hand, and is not guaranteed to satisfy the specified constraints, which might encode safety, fairness or other application-specific requirements. In order to safely learn from baseline policies, we propose an iterative policy optimization algorithm that alternates between maximizing expected return on the task, minimizing distance to the baseline policy, and projecting the policy onto the constraint-satisfying set. We analyze our algorithm theoretically and provide a finite-time convergence guarantee. In our experiments on five different control tasks, our algorithm consistently outperforms several state-of-the-art baselines, achieving 10 times fewer constraint violations and 40% higher reward on average.

Related papers

Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning [66.4260157478436]
We study reinforcement learning in the policy learning setting.<n>The goal is to find a policy whose performance is competitive with the best policy in a given class of interest.
arXiv Detail & Related papers (2025-07-06T14:40:05Z)
SPoRt -- Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL [54.022106606140774]
We present theoretical results that provide a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setup. We also present SPoRt, which enables the user to trade off safety guarantees in exchange for task-specific performance.
arXiv Detail & Related papers (2025-04-08T19:09:07Z)
Adversarial Constrained Policy Optimization: Improving Constrained Reinforcement Learning by Adapting Budgets [6.5472155063246085]
Constrained reinforcement learning has achieved promising progress in safety-critical fields where both rewards and constraints are considered. We propose Adrial Constrained Policy Optimization (ACPO), which enables simultaneous optimization of reward and the adaptation of cost budgets during training.
arXiv Detail & Related papers (2024-10-28T07:04:32Z)
Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching [0.0]
Constrained Reinforcement Learning (CRL) is a subset of machine learning that introduces constraints into the traditional reinforcement learning (RL) framework. We propose a novel framework that relies on switching between pure learning (reward) and constraint satisfaction.
arXiv Detail & Related papers (2024-10-10T15:19:45Z)
Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning [26.244121960815907]
We propose a primal-based framework that orchestrates policy optimization between multi-objective learning and constraint adherence. Our method employs a novel natural policy gradient manipulation method to optimize multiple RL objectives. Empirically, our proposed method also outperforms prior state-of-the-art methods on challenging safe multi-objective reinforcement learning tasks.
arXiv Detail & Related papers (2024-05-26T00:42:10Z)
Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction. We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z)
Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states. The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z)
Lexicographic Multi-Objective Reinforcement Learning [65.90380946224869]
We present a family of both action-value and policy gradient algorithms that can be used to solve such problems. We show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.
arXiv Detail & Related papers (2022-12-28T10:22:36Z)
Safety-Constrained Policy Transfer with Successor Features [19.754549649781644]
We propose a Constrained Markov Decision Process (CMDP) formulation that enables the transfer of policies and adherence to safety constraints. Our approach relies on a novel extension of generalized policy improvement to constrained settings via a Lagrangian formulation. Our experiments in simulated domains show that our approach is effective; it visits unsafe states less frequently and outperforms alternative state-of-the-art methods when taking safety constraints into account.
arXiv Detail & Related papers (2022-11-10T06:06:36Z)
Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z)
State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z)
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria. We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
Deep Constrained Q-learning [15.582910645906145]
In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a set of constraints. We propose Constrained Q-learning, a novel off-policy reinforcement learning framework restricting the action space directly in the Q-update to learn the optimal Q-function for the induced constrained MDP and the corresponding safe policy.
arXiv Detail & Related papers (2020-03-20T17:26:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.