Replicability is Asymptotically Free in Multi-armed Bandits
- URL: http://arxiv.org/abs/2402.07391v1
- Date: Mon, 12 Feb 2024 03:31:34 GMT
- Title: Replicability is Asymptotically Free in Multi-armed Bandits
- Authors: Junpei Komiyama, Shinji Ito, Yuichi Yoshida, Souta Koshino
- Abstract summary: This work is motivated by the growing demand for reproducible machine learning.
In particular, we consider a replicable algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset.
- Score: 45.729105054410745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work is motivated by the growing demand for reproducible machine
learning. We study the stochastic multi-armed bandit problem. In particular, we
consider a replicable algorithm that ensures, with high probability, that the
algorithm's sequence of actions is not affected by the randomness inherent in
the dataset. We observe that existing algorithms require $O(1/\rho^2)$ times
more regret than nonreplicable algorithms, where $\rho$ is the level of
nonreplication. However, we demonstrate that this additional cost is
unnecessary when the time horizon $T$ is sufficiently large for a given $\rho$,
provided that the magnitude of the confidence bounds is chosen carefully. We
introduce an explore-then-commit algorithm that draws arms uniformly before
committing to a single arm. Additionally, we examine a successive elimination
algorithm that eliminates suboptimal arms at the end of each phase. To ensure
the replicability of these algorithms, we incorporate randomness into their
decision-making processes. We extend the use of successive elimination to the
linear bandit problem as well. For the analysis of these algorithms, we propose
a principled approach to limiting the probability of nonreplication. This
approach elucidates the steps that existing research has implicitly followed.
Furthermore, we derive the first lower bound for the two-armed replicable
bandit problem, which implies the optimality of the proposed algorithms up to a
$\log\log T$ factor for the two-armed case.
Related papers
- Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery [45.601316850669406]
We present efficient algorithms for policy evaluation, best policy identification and regret minimization.
For policy evaluation and best policy identification, we show that our algorithms are nearly minimax optimal.
All the proposed algorithms consist of two phases: they first leverage spectral methods to estimate the left and right singular subspaces of the low-rank reward matrix.
arXiv Detail & Related papers (2024-02-24T06:36:08Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z) - Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards.
The goal is to identify the arm with the highest quantile at a fixed, prescribed level.
We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z) - Rate-adaptive model selection over a collection of black-box contextual
bandit algorithms [0.966840768820136]
We consider the model selection task in the contextual bandit setting.
Our proposal is the first one to be rate-adaptive for a collection of general black-box contextual bandit algorithms.
arXiv Detail & Related papers (2020-06-05T18:55:16Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.