Learning under Invariable Bayesian Safety
- URL: http://arxiv.org/abs/2006.04497v1
- Date: Mon, 8 Jun 2020 12:07:59 GMT
- Title: Learning under Invariable Bayesian Safety
- Authors: Gal Bahar, Omer Ben-Porat, Kevin Leyton-Brown and Moshe Tennenholtz
- Abstract summary: We adopt a model inspired by recent work on a bandit-like setting for recommendations.
We introduce a safety constraint that should be respected in every round and determines that the expected value in each round is above a given threshold.
- Score: 36.96284975799963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent body of work addresses safety constraints in explore-and-exploit
systems. Such constraints arise where, for example, exploration is carried out
by individuals whose welfare should be balanced with overall welfare. In this
paper, we adopt a model inspired by recent work on a bandit-like setting for
recommendations. We contribute to this line of literature by introducing a
safety constraint that should be respected in every round and determines that
the expected value in each round is above a given threshold. Due to our
modeling, the safe explore-and-exploit policy deserves careful planning, or
otherwise, it will lead to sub-optimal welfare. We devise an asymptotically
optimal algorithm for the setting and analyze its instance-dependent
convergence rate.
Related papers
- Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe Exploration Incurs Nearly No Additional Sample Complexity for
Reward-free RL [43.672794342894946]
Reward-free reinforcement learning (RF-RL) relies on random action-taking to explore the unknown environment without any reward feedback information.
It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning.
We propose a unified Safe reWard-frEe ExploraTion (SWEET) framework, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively.
arXiv Detail & Related papers (2022-06-28T15:00:45Z) - Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk [45.87122314291089]
We investigate a natural but surprisingly unstudied approach to the multi-armed bandit problem under safety risk constraints.
We formulate a pseudo-regret for this setting that enforces this safety constraint in a per-round way by softly penalising any violation.
This has practical relevance to scenarios such as clinical trials, where one must maintain safety for each round rather than in an aggregated sense.
arXiv Detail & Related papers (2022-04-01T22:08:03Z) - Learning to Act Safely with Limited Exposure and Almost Sure Certainty [1.0323063834827415]
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for exploratory trials.
We first focus on the canonical multi-armed bandit problem and seek to study the intrinsic trade-offs of learning safety in the presence of uncertainty.
arXiv Detail & Related papers (2021-05-18T18:05:12Z) - Learning to be safe, in finite time [4.189643331553922]
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials.
We focus on the canonical multi-armed bandit problem and seek to study the exploration-preservation trade-off intrinsic within safe learning.
arXiv Detail & Related papers (2020-10-01T14:03:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.