Combinatorial Blocking Bandits with Stochastic Delays
- URL: http://arxiv.org/abs/2105.10625v1
- Date: Sat, 22 May 2021 02:46:04 GMT
- Title: Combinatorial Blocking Bandits with Stochastic Delays
- Authors: Alexia Atsidakou, Orestis Papadigenopoulos, Soumya Basu, Constantine
Caramanis, Sanjay Shakkottai
- Abstract summary: Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling.
In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints.
We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms.
When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
- Score: 33.65025386998747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has considered natural variations of the multi-armed bandit
problem, where the reward distribution of each arm is a special function of the
time passed since its last pulling. In this direction, a simple (yet widely
applicable) model is that of blocking bandits, where an arm becomes unavailable
for a deterministic number of rounds after each play. In this work, we extend
the above model in two directions: (i) We consider the general combinatorial
setting where more than one arms can be played at each round, subject to
feasibility constraints. (ii) We allow the blocking time of each arm to be
stochastic. We first study the computational/unconditional hardness of the
above setting and identify the necessary conditions for the problem to become
tractable (even in an approximate sense). Based on these conditions, we provide
a tight analysis of the approximation guarantee of a natural greedy heuristic
that always plays the maximum expected reward feasible subset among the
available (non-blocked) arms. When the arms' expected rewards are unknown, we
adapt the above heuristic into a bandit algorithm, based on UCB, for which we
provide sublinear (approximate) regret guarantees, matching the theoretical
lower bounds in the limiting case of absence of delays.
Related papers
- Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits.
In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z) - Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment.
A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability.
We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z) - Finding Optimal Arms in Non-stochastic Combinatorial Bandits with
Semi-bandit Feedback and Finite Budget [6.759124697337311]
We consider the bandits problem with semi-bandit feedback under finite sampling budget constraints.
The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received.
We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies.
arXiv Detail & Related papers (2022-02-09T14:36:05Z) - The Countable-armed Bandit with Vanishing Arms [8.099977107670918]
We consider a bandit problem with countably many arms partitioned into finitely many "types"
A "non-stationary" distribution governs the relative abundance of each arm-type in the population of arms, aka the "arm-reservoir"
arXiv Detail & Related papers (2021-10-23T02:47:55Z) - Recurrent Submodular Welfare and Matroid Blocking Bandits [22.65352007353614]
A recent line of research focuses on the study of the multi-armed bandits problem (MAB)
We develop new algorithmic ideas that allow us to obtain a $ (1 - frac1e)$-approximation for any matroid.
A key ingredient is the technique of correlated (interleaved) scheduling.
arXiv Detail & Related papers (2021-01-30T21:51:47Z) - Online Model Selection: a Rested Bandit Formulation [49.69377391589057]
We introduce and analyze a best arm identification problem in the rested bandit setting.
We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game.
Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function.
arXiv Detail & Related papers (2020-12-07T08:23:08Z) - A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms.
We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z) - Contextual Blocking Bandits [35.235375147227124]
We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards.
Playing an arm blocks it (across all contexts) for a fixed and known number of future time steps.
We propose a UCB-based variant of the full-information algorithm that guarantees a $mathcalO(log T)$-regret w.r.t. an $alpha$regret strategy in $T time steps, matching the $Omega(log(T)$ lower bound
arXiv Detail & Related papers (2020-03-06T20:34:42Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round.
We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
arXiv Detail & Related papers (2020-02-13T08:53:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.