Learning to be safe, in finite time
- URL: http://arxiv.org/abs/2010.00417v2
- Date: Wed, 31 Mar 2021 14:44:00 GMT
- Title: Learning to be safe, in finite time
- Authors: Agustin Castellano, Juan Bazerque, Enrique Mallada
- Abstract summary: This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials.
We focus on the canonical multi-armed bandit problem and seek to study the exploration-preservation trade-off intrinsic within safe learning.
- Score: 4.189643331553922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to put forward the concept that learning to take safe actions
in unknown environments, even with probability one guarantees, can be achieved
without the need for an unbounded number of exploratory trials, provided that
one is willing to relax its optimality requirements mildly. We focus on the
canonical multi-armed bandit problem and seek to study the
exploration-preservation trade-off intrinsic within safe learning. More
precisely, by defining a handicap metric that counts the number of unsafe
actions, we provide an algorithm for discarding unsafe machines (or actions),
with probability one, that achieves constant handicap. Our algorithm is rooted
in the classical sequential probability ratio test, redefined here for
continuing tasks. Under standard assumptions on sufficient exploration, our
rule provably detects all unsafe machines in an (expected) finite number of
rounds. The analysis also unveils a trade-off between the number of rounds
needed to secure the environment and the probability of discarding safe
machines. Our decision rule can wrap around any other algorithm to optimize a
specific auxiliary goal since it provides a safe environment to search for
(approximately) optimal policies. Simulations corroborate our theoretical
findings and further illustrate the aforementioned trade-offs.
Related papers
- Can a Bayesian Oracle Prevent Harm from an Agent? [48.12936383352277]
We consider estimating a context-dependent bound on the probability of violating a given safety specification.
Noting that different plausible hypotheses about the world could produce very different outcomes, we derive on the safety violation probability predicted under the true but unknown hypothesis.
We consider two forms of this result, in the iid case and in the non-iid case, and conclude with open problems towards turning such results into practical AI guardrails.
arXiv Detail & Related papers (2024-08-09T18:10:42Z) - Information-Theoretic Safe Bayesian Optimization [59.758009422067005]
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2024-02-23T14:31:10Z) - Safe Exploration in Reinforcement Learning: A Generalized Formulation
and Algorithms [8.789204441461678]
We present a solution of the safe exploration (GSE) problem in the form of a meta-algorithm for safe exploration, MASE.
Our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks.
arXiv Detail & Related papers (2023-10-05T00:47:09Z) - A computationally lightweight safe learning algorithm [1.9295598343317182]
We propose a safe learning algorithm that provides probabilistic safety guarantees but leverages the Nadaraya-Watson estimator.
We provide theoretical guarantees for the estimates, embed them into a safe learning algorithm, and show numerical experiments on a simulated seven-degrees-of-freedom robot manipulator.
arXiv Detail & Related papers (2023-09-07T12:21:22Z) - Information-Theoretic Safe Exploration with Gaussian Processes [89.31922008981735]
We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2022-12-09T15:23:58Z) - Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs.
We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z) - Near-Optimal Multi-Agent Learning for Safe Coverage Control [76.99020416197631]
In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density.
In this paper, we aim to efficiently learn the density to approximately solve the coverage problem while preserving the agents' safety.
We give first of its kind results: near optimal coverage in finite time while provably guaranteeing safety.
arXiv Detail & Related papers (2022-10-12T16:33:34Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Safe Exploration Incurs Nearly No Additional Sample Complexity for
Reward-free RL [43.672794342894946]
Reward-free reinforcement learning (RF-RL) relies on random action-taking to explore the unknown environment without any reward feedback information.
It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning.
We propose a unified Safe reWard-frEe ExploraTion (SWEET) framework, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively.
arXiv Detail & Related papers (2022-06-28T15:00:45Z) - Safe Reinforcement Learning by Imagining the Near Future [37.0376099401243]
In this work, we focus on the setting where unsafe states can be avoided by planning ahead a short time into the future.
We devise a model-based algorithm that heavily penalizes unsafe trajectories, and derive guarantees that our algorithm can avoid unsafe states under certain assumptions.
Experiments demonstrate that our algorithm can achieve competitive rewards with fewer safety violations in several continuous control tasks.
arXiv Detail & Related papers (2022-02-15T23:28:24Z) - Learning to Act Safely with Limited Exposure and Almost Sure Certainty [1.0323063834827415]
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for exploratory trials.
We first focus on the canonical multi-armed bandit problem and seek to study the intrinsic trade-offs of learning safety in the presence of uncertainty.
arXiv Detail & Related papers (2021-05-18T18:05:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.