Related papers: Safe Reinforcement Learning in Constrained Markov Decision Processes

Safe Reinforcement Learning in Constrained Markov Decision Processes

URL: http://arxiv.org/abs/2008.06626v1
Date: Sat, 15 Aug 2020 02:20:23 GMT
Title: Safe Reinforcement Learning in Constrained Markov Decision Processes
Authors: Akifumi Wachi and Yanan Sui
Abstract summary: We propose an algorithm, SNO-MDP, that explores and optimize Markov decision processes under unknown safety constraints. We provide theoretical guarantees on both the satisfaction of the safety constraint and the near-optimality of the cumulative reward.
Score: 20.175139766171277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Safe reinforcement learning has been a promising approach for optimizing the policy of an agent that operates in safety-critical applications. In this paper, we propose an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints. Specifically, we take a stepwise approach for optimizing safety and cumulative reward. In our method, the agent first learns safety constraints by expanding the safe region, and then optimizes the cumulative reward in the certified safe region. We provide theoretical guarantees on both the satisfaction of the safety constraint and the near-optimality of the cumulative reward under proper regularity assumptions. In our experiments, we demonstrate the effectiveness of SNO-MDP through two experiments: one uses a synthetic data in a new, openly-available environment named GP-SAFETY-GYM, and the other simulates Mars surface exploration by using real observation data.

Related papers

Saffron-1: Safety Inference Scaling [69.61130284742353]
SAFFRON is a novel inference scaling paradigm tailored explicitly for safety assurance.<n>Central to our approach is the introduction of a multifurcation reward model (MRM) that significantly reduces the required number of reward model evaluations.<n>We publicly release our trained multifurcation reward model (Saffron-1) and the accompanying token-level safety reward dataset (Safety4M)
arXiv Detail & Related papers (2025-06-06T18:05:45Z)
Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z)
Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation [26.244121960815907]
Managing the trade-off between reward and safety during exploration presents a significant challenge. In this study, we aim to address this conflicting relation by leveraging the theory of gradient manipulation. Experimental results demonstrate that our algorithms outperform several state-of-the-art baselines in terms of balancing reward and safety optimization.
arXiv Detail & Related papers (2024-05-02T19:07:14Z)
SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization. In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints. Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z)
Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms [8.789204441461678]
We present a solution of the safe exploration (GSE) problem in the form of a meta-algorithm for safe exploration, MASE. Our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks.
arXiv Detail & Related papers (2023-10-05T00:47:09Z)
Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation [3.5884936187733394]
We address the problem of safe reinforcement learning from pixel observations. We formalize the problem in a constrained, partially observable Markov decision process framework. We employ a novel safety critic using the latent actor-critic (SLAC) approach.
arXiv Detail & Related papers (2022-10-02T19:55:42Z)
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial. Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size. We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z)
Constrained Policy Optimization via Bayesian World Models [79.0077602277004]
LAMBDA is a model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes. We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.
arXiv Detail & Related papers (2022-01-24T17:02:22Z)
Safe Policy Optimization with Local Generalized Linear Function Approximations [17.84511819022308]
Existing safe exploration methods guaranteed safety under the assumption of regularity. We propose a novel algorithm, SPO-LF, that optimize an agent's policy while learning the relation between a locally available feature obtained by sensors and environmental reward/safety. We experimentally show that our algorithm is 1) more efficient in terms of sample complexity and computational cost and 2) more applicable to large-scale problems than previous safe RL methods with theoretical guarantees.
arXiv Detail & Related papers (2021-11-09T00:47:50Z)
Provably Safe PAC-MDP Exploration Using Analogies [87.41775218021044]
Key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration and safety. We propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense.
arXiv Detail & Related papers (2020-07-07T15:50:50Z)
SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)
Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training. We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.