Avoiding Catastrophe in Online Learning by Asking for Help
- URL: http://arxiv.org/abs/2402.08062v3
- Date: Fri, 04 Oct 2024 15:23:43 GMT
- Title: Avoiding Catastrophe in Online Learning by Asking for Help
- Authors: Benjamin Plaut, Hanlin Zhu, Stuart Russell,
- Abstract summary: We propose an online learning problem where the goal is to minimize the chance of catastrophe.
We provide an algorithm whose regret and rate of querying the mentor both approach 0 as the time horizon grows.
- Score: 7.881265948305421
- License:
- Abstract: Most learning algorithms with formal regret guarantees assume that no mistake is irreparable and essentially rely on trying all possible behaviors. This approach is problematic when some mistakes are \emph{catastrophic}, i.e., irreparable. We propose an online learning problem where the goal is to minimize the chance of catastrophe. Specifically, we assume that the payoff in each round represents the chance of avoiding catastrophe that round and aim to maximize the product of payoffs (the overall chance of avoiding catastrophe) while allowing a limited number of queries to a mentor. We first show that in general, any algorithm either constantly queries the mentor or is nearly guaranteed to cause catastrophe. However, in settings where the mentor policy class is learnable in the standard online learning model, we provide an algorithm whose regret and rate of querying the mentor both approach 0 as the time horizon grows. Conceptually, if a policy class is learnable in the absence of catastrophic risk, it is learnable in the presence of catastrophic risk if the agent can ask for help.
Related papers
- Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents.
Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations.
While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z) - Online Learning with Unknown Constraints [10.263431543520452]
We consider the problem of online learning where the sequence of actions played by the learner must adhere to an unknown safety constraint at every round.
The goal is to minimize regret with respect to the best safe action in hindsight while simultaneously satisfying the safety constraint with high probability on each round.
arXiv Detail & Related papers (2024-03-06T20:23:59Z) - Truly No-Regret Learning in Constrained MDPs [61.78619476991494]
We propose a model-based primal-dual algorithm to learn in an unknown CMDP.
We prove that our algorithm achieves sublinear regret without error cancellations.
arXiv Detail & Related papers (2024-02-24T09:47:46Z) - The Risks of Recourse in Binary Classification [10.067421338825545]
We study whether providing algorithmic recourse is beneficial or harmful at the population level.
We find that there are many plausible scenarios in which providing recourse turns out to be harmful.
We conclude that the current concept of algorithmic recourse is not reliably beneficial, and therefore requires rethinking.
arXiv Detail & Related papers (2023-06-01T09:46:43Z) - Agnostic Multi-Robust Learning Using ERM [19.313739782029185]
A fundamental problem in robust learning is asymmetry: a learner needs to correctly classify every one of exponentially-many perturbations that an adversary might make to a test-time natural example.
In contrast, the attacker only needs to find one successful perturbation.
We introduce a novel multi-group setting and introduce a novel multi-robust learning problem.
arXiv Detail & Related papers (2023-03-15T21:30:14Z) - Learning to Be Cautious [71.9871661858886]
A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations.
We present a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to emphlearn to be cautious.
arXiv Detail & Related papers (2021-10-29T16:52:45Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z) - Online non-convex optimization with imperfect feedback [33.80530308979131]
We consider the problem of online learning with non- losses.
In terms of feedback, we assume that the learner observes - or otherwise constructs - an inexact model for the loss function at each stage.
We propose a mixed-strategy learning policy based on dual averaging.
arXiv Detail & Related papers (2020-10-16T16:53:13Z) - Policy Gradient for Continuing Tasks in Non-stationary Markov Decision
Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities.
We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy.
A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z) - Excursion Search for Constrained Bayesian Optimization under a Limited
Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures.
Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.