Reproducible Bandits
- URL: http://arxiv.org/abs/2210.01898v1
- Date: Tue, 4 Oct 2022 20:36:45 GMT
- Title: Reproducible Bandits
- Authors: Hossein Esfandiari, Alkis Kalavasis, Amin Karbasi, Andreas Krause,
Vahab Mirrokni, Grigoris Velegkas
- Abstract summary: A policy in the bandit environment is called reproducible if it pulls, with high probability, the exact same sequence of arms in two different executions.
We show that not only do reproducible policies exist, but also they achieve almost the same optimal (non-reproducible) regret bounds in terms of the time horizon.
Our results show that even though randomization is crucial for the exploration-exploitation trade-off, an optimal balance can still be achieved while pulling the exact same arms in two different rounds of executions.
- Score: 95.8830340560603
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce the notion of reproducible policies in the
context of stochastic bandits, one of the canonical problems in interactive
learning. A policy in the bandit environment is called reproducible if it
pulls, with high probability, the \emph{exact} same sequence of arms in two
different and independent executions (i.e., under independent reward
realizations). We show that not only do reproducible policies exist, but also
they achieve almost the same optimal (non-reproducible) regret bounds in terms
of the time horizon. More specifically, in the stochastic multi-armed bandits
setting, we develop a policy with an optimal problem-dependent regret bound
whose dependence on the reproducibility parameter is also optimal. Similarly,
for stochastic linear bandits (with finitely and infinitely many arms) we
develop reproducible policies that achieve the best-known problem-independent
regret bounds with an optimal dependency on the reproducibility parameter. Our
results show that even though randomization is crucial for the
exploration-exploitation trade-off, an optimal balance can still be achieved
while pulling the exact same arms in two different rounds of executions.
Related papers
- Information Capacity Regret Bounds for Bandits with Mediator Feedback [55.269551124587224]
We introduce the policy set capacity as an information-theoretic measure for the complexity of the policy set.
Adopting the classical EXP4 algorithm, we provide new regret bounds depending on the policy set capacity.
For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.
arXiv Detail & Related papers (2024-02-15T19:18:47Z) - Thompson Exploration with Best Challenger Rule in Best Arm
Identification [66.33448474838342]
We study the fixed-confidence best arm identification problem in the bandit framework.
We propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule.
arXiv Detail & Related papers (2023-10-01T01:37:02Z) - Best of Both Worlds Model Selection [39.211071446838474]
We study the problem of model selection in bandit scenarios in the presence of nested policy classes.
Our approach requires that each base learner comes with a candidate regret bound that may or may not hold.
These are the first theoretical results that achieve best of both world (stochastic and adversarial) guarantees while performing model selection in (linear) bandit scenarios.
arXiv Detail & Related papers (2022-06-29T20:57:30Z) - The Countable-armed Bandit with Vanishing Arms [8.099977107670918]
We consider a bandit problem with countably many arms partitioned into finitely many "types"
A "non-stationary" distribution governs the relative abundance of each arm-type in the population of arms, aka the "arm-reservoir"
arXiv Detail & Related papers (2021-10-23T02:47:55Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Adaptive Discretization for Adversarial Lipschitz Bandits [85.39106976861702]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.
A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.
We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence
Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian
Rewards [10.66048003460524]
We study an extension of the classic multi-armed bandit problem which involves multiple plays and Markovian rewards in the rested bandits setting.
In order to tackle this problem we consider an adaptive allocation rule which at each stage combines the information from the sample means of all the arms, with the Kullback-Leibler upper confidence bound of a single arm which is selected in round-robin way.
arXiv Detail & Related papers (2020-01-30T08:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.