Related papers: Replicability is Asymptotically Free in Multi-armed Bandits

Replicability is Asymptotically Free in Multi-armed Bandits

URL: http://arxiv.org/abs/2402.07391v1
Date: Mon, 12 Feb 2024 03:31:34 GMT
Title: Replicability is Asymptotically Free in Multi-armed Bandits
Authors: Junpei Komiyama, Shinji Ito, Yuichi Yoshida, Souta Koshino
Abstract summary: This work is motivated by the growing demand for reproducible machine learning. In particular, we consider a replicable algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset.
Score: 45.729105054410745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work is motivated by the growing demand for reproducible machine learning. We study the stochastic multi-armed bandit problem. In particular, we consider a replicable algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset. We observe that existing algorithms require $O(1/\rho^2)$ times more regret than nonreplicable algorithms, where $\rho$ is the level of nonreplication. However, we demonstrate that this additional cost is unnecessary when the time horizon $T$ is sufficiently large for a given $\rho$, provided that the magnitude of the confidence bounds is chosen carefully. We introduce an explore-then-commit algorithm that draws arms uniformly before committing to a single arm. Additionally, we examine a successive elimination algorithm that eliminates suboptimal arms at the end of each phase. To ensure the replicability of these algorithms, we incorporate randomness into their decision-making processes. We extend the use of successive elimination to the linear bandit problem as well. For the analysis of these algorithms, we propose a principled approach to limiting the probability of nonreplication. This approach elucidates the steps that existing research has implicitly followed. Furthermore, we derive the first lower bound for the two-armed replicable bandit problem, which implies the optimality of the proposed algorithms up to a $\log\log T$ factor for the two-armed case.

Related papers

Efficient kernelized bandit algorithms via exploration distributions [13.86858382375188]
We propose a class of computationally efficient kernelized bandit algorithms, which we call GP-Generic.<n>We show that our proposed generic algorithm realizes a wide range of concrete algorithms that achieve $tildeO(gamma_TsqrtT)$ regret bounds.
arXiv Detail & Related papers (2025-06-11T18:23:43Z)
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z)
Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery [45.601316850669406]
We present efficient algorithms for policy evaluation, best policy identification and regret minimization. For policy evaluation and best policy identification, we show that our algorithms are nearly minimax optimal. All the proposed algorithms consist of two phases: they first leverage spectral methods to estimate the left and right singular subspaces of the low-rank reward matrix.
arXiv Detail & Related papers (2024-02-24T06:36:08Z)
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework. We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards [40.99322897009357]
We propose two novel algorithms based on truncation and mean of medians. Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches. Our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $epsilon=1$.
arXiv Detail & Related papers (2023-10-28T13:01:10Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm for Stochastic Bandits with Corruptions [91.8283876874947]
We consider a best arm identification (BAI) problem for Sequential bandits with adversarial corruptions in the fixed-budget setting of T steps. We design a novel randomized algorithm, Probabilistic Shrinking($u$) (PSS($u$)), which is agnostic to the amount of corruptions. We show that when the CPS is sufficiently large, no algorithm can achieve a BAI probability tending to $1$ as $Trightarrow infty$.
arXiv Detail & Related papers (2020-10-15T17:34:26Z)
Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively. In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)
Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward. We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z)
Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards. The goal is to identify the arm with the highest quantile at a fixed, prescribed level. We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z)
Rate-adaptive model selection over a collection of black-box contextual bandit algorithms [0.966840768820136]
We consider the model selection task in the contextual bandit setting. Our proposal is the first one to be rate-adaptive for a collection of general black-box contextual bandit algorithms.
arXiv Detail & Related papers (2020-06-05T18:55:16Z)
A New Randomized Primal-Dual Algorithm for Convex Optimization with Optimal Last Iterate Rates [16.54912614895861]
We develop a novel unified randomized block-coordinate primal-dual algorithm to solve a class of nonsmooth constrained convex optimization problems. We prove that our algorithm achieves optimal convergence rates in two cases: general convexity and strong convexity. Our results show that the proposed method has encouraging performance on different experiments.
arXiv Detail & Related papers (2020-03-03T03:59:26Z)
Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack [41.060507338755784]
This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks. We propose a novel sample median-based and exploration-aided UCB algorithm (called med-E-UCB) and a median-based $epsilon$-greedy algorithm (called med-$epsilon$-greedy) Both algorithms are provably robust to the aforementioned attack model. More specifically we show that both algorithms achieve $mathcalO(log T)$ pseudo-regret (i.e
arXiv Detail & Related papers (2020-02-17T19:21:08Z)
Bandit algorithms to emulate human decision making using probabilistic distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions. We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.