Max-Quantile Grouped Infinite-Arm Bandits
- URL: http://arxiv.org/abs/2210.01295v1
- Date: Tue, 4 Oct 2022 00:46:49 GMT
- Title: Max-Quantile Grouped Infinite-Arm Bandits
- Authors: Ivan Lau, Yan Hao Ling, Mayank Shrivastava, Jonathan Scarlett
- Abstract summary: bandit problem in which there are a number of groups each consisting of infinitely many arms.
We introduce a two-step algorithm that first requests a fixed number of arms from each group and then runs a finite-arm grouped max-quantile bandit algorithm.
We characterize both the instance-dependent and worst-case regret, and provide a matching lower bound for the latter.
- Score: 39.7239093424124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider a bandit problem in which there are a number of
groups each consisting of infinitely many arms. Whenever a new arm is requested
from a given group, its mean reward is drawn from an unknown reservoir
distribution (different for each group), and the uncertainty in the arm's mean
reward can only be reduced via subsequent pulls of the arm. The goal is to
identify the infinite-arm group whose reservoir distribution has the highest
$(1-\alpha)$-quantile (e.g., median if $\alpha = \frac{1}{2}$), using as few
total arm pulls as possible. We introduce a two-step algorithm that first
requests a fixed number of arms from each group and then runs a finite-arm
grouped max-quantile bandit algorithm. We characterize both the
instance-dependent and worst-case regret, and provide a matching lower bound
for the latter, while discussing various strengths, weaknesses, algorithmic
improvements, and potential lower bounds associated with our instance-dependent
upper bounds.
Related papers
- Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits.
In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z) - A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment.
A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability.
We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z) - Contextual Combinatorial Multi-output GP Bandits with Group Constraints [11.317136648551537]
In federated multi-armed bandit problems, maximizing global reward while satisfying minimum privacy requirements to protect clients is the main goal.
We consider a contextual bandit setting with groups and changing action sets, where similar base arms arrive in groups and a set of base arms, called a super arm, must be chosen in each round to maximize super arm reward while satisfying the constraints of the rewards of groups from which base arms were chosen.
We then propose a novel double-UCB GP-bandit algorithm, called Thresholded Combinatored Upper Confidence Bounds (TCGP-UCB), which balances between maximizing cumulative super arm reward and satisfying
arXiv Detail & Related papers (2021-11-29T18:39:09Z) - Combinatorial Blocking Bandits with Stochastic Delays [33.65025386998747]
Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling.
In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints.
We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms.
When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
arXiv Detail & Related papers (2021-05-22T02:46:04Z) - Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms.
We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Multi-Armed Bandits with Dependent Arms [18.81667618369821]
We study a variant of the classical multi-armed bandit problem (MABP) which we call as Multi-Armed Bandits with dependent arms.
Multiple arms are grouped together to form a cluster, and the reward distributions of arms belonging to the same cluster are known functions of an unknown parameter that is a characteristic of the cluster.
We develop learning algorithms based on the UCB principle which utilize these additional side observations appropriately while performing exploration-exploitation trade-off.
arXiv Detail & Related papers (2020-10-13T14:00:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.