Finding Safe Zones of policies Markov Decision Processes
- URL: http://arxiv.org/abs/2202.11593v2
- Date: Mon, 9 Oct 2023 17:48:32 GMT
- Title: Finding Safe Zones of policies Markov Decision Processes
- Authors: Lee Cohen, Yishay Mansour, Michal Moshkovitz
- Abstract summary: Given a policy of a Decision Process, we define a SafeZone as a subset of states, such that most of the policy's trajectories are confined to this subset.
The quality of a SafeZone is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset.
- Score: 53.93816835129937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a policy of a Markov Decision Process, we define a SafeZone as a subset
of states, such that most of the policy's trajectories are confined to this
subset. The quality of a SafeZone is parameterized by the number of states and
the escape probability, i.e., the probability that a random trajectory will
leave the subset. SafeZones are especially interesting when they have a small
number of states and low escape probability. We study the complexity of finding
optimal SafeZones, and show that in general, the problem is computationally
hard. Our main result is a bi-criteria approximation learning algorithm with a
factor of almost $2$ approximation for both the escape probability and SafeZone
size, using a polynomial size sample complexity.
Related papers
- Information-Theoretic Safe Bayesian Optimization [59.758009422067005]
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2024-02-23T14:31:10Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Near-Optimal Multi-Agent Learning for Safe Coverage Control [76.99020416197631]
In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density.
In this paper, we aim to efficiently learn the density to approximately solve the coverage problem while preserving the agents' safety.
We give first of its kind results: near optimal coverage in finite time while provably guaranteeing safety.
arXiv Detail & Related papers (2022-10-12T16:33:34Z) - Safe Exploration Incurs Nearly No Additional Sample Complexity for
Reward-free RL [43.672794342894946]
Reward-free reinforcement learning (RF-RL) relies on random action-taking to explore the unknown environment without any reward feedback information.
It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning.
We propose a unified Safe reWard-frEe ExploraTion (SWEET) framework, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively.
arXiv Detail & Related papers (2022-06-28T15:00:45Z) - Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse
Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances [27.122181278234617]
We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances.
We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws.
We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
arXiv Detail & Related papers (2022-01-12T13:38:33Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Expanding boundaries of Gap Safe screening [0.0]
A powerful strategy to boost the performance of algorithms is known as safe screening.
We extend the existing Gap Safe screening framework by relaxing the global strong-concavity assumption on the dual cost function.
The proposed general framework is exemplified by some notable particular cases: logistic function, beta = 1.5 and Kullback-Leibler divergences.
arXiv Detail & Related papers (2021-02-22T09:23:31Z) - Learning to be safe, in finite time [4.189643331553922]
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials.
We focus on the canonical multi-armed bandit problem and seek to study the exploration-preservation trade-off intrinsic within safe learning.
arXiv Detail & Related papers (2020-10-01T14:03:34Z) - Enforcing Almost-Sure Reachability in POMDPs [10.883864654718103]
Partially-Observable Markov Decision Processes (POMDPs) are a well-known model for sequential decision making under limited information.
We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state.
We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative.
arXiv Detail & Related papers (2020-06-30T19:59:46Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.