Related papers: Combinatorial Blocking Bandits with Stochastic Delays

Combinatorial Blocking Bandits with Stochastic Delays

URL: http://arxiv.org/abs/2105.10625v1
Date: Sat, 22 May 2021 02:46:04 GMT
Title: Combinatorial Blocking Bandits with Stochastic Delays
Authors: Alexia Atsidakou, Orestis Papadigenopoulos, Soumya Basu, Constantine Caramanis, Sanjay Shakkottai
Abstract summary: Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling. In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints. We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms. When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
Score: 33.65025386998747
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable) model is that of blocking bandits, where an arm becomes unavailable for a deterministic number of rounds after each play. In this work, we extend the above model in two directions: (i) We consider the general combinatorial setting where more than one arms can be played at each round, subject to feasibility constraints. (ii) We allow the blocking time of each arm to be stochastic. We first study the computational/unconditional hardness of the above setting and identify the necessary conditions for the problem to become tractable (even in an approximate sense). Based on these conditions, we provide a tight analysis of the approximation guarantee of a natural greedy heuristic that always plays the maximum expected reward feasible subset among the available (non-blocked) arms. When the arms' expected rewards are unknown, we adapt the above heuristic into a bandit algorithm, based on UCB, for which we provide sublinear (approximate) regret guarantees, matching the theoretical lower bounds in the limiting case of absence of delays.

Related papers

Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits. In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z)
Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment. A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability. We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z)
Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget [6.759124697337311]
We consider the bandits problem with semi-bandit feedback under finite sampling budget constraints. The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received. We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies.
arXiv Detail & Related papers (2022-02-09T14:36:05Z)
The Countable-armed Bandit with Vanishing Arms [8.099977107670918]
We consider a bandit problem with countably many arms partitioned into finitely many "types" A "non-stationary" distribution governs the relative abundance of each arm-type in the population of arms, aka the "arm-reservoir"
arXiv Detail & Related papers (2021-10-23T02:47:55Z)
Recurrent Submodular Welfare and Matroid Blocking Bandits [22.65352007353614]
A recent line of research focuses on the study of the multi-armed bandits problem (MAB) We develop new algorithmic ideas that allow us to obtain a $ (1 - frac1e)$-approximation for any matroid. A key ingredient is the technique of correlated (interleaved) scheduling.
arXiv Detail & Related papers (2021-01-30T21:51:47Z)
Online Model Selection: a Rested Bandit Formulation [49.69377391589057]
We introduce and analyze a best arm identification problem in the rested bandit setting. We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game. Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function.
arXiv Detail & Related papers (2020-12-07T08:23:08Z)
A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z)
Contextual Blocking Bandits [35.235375147227124]
We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards. Playing an arm blocks it (across all contexts) for a fixed and known number of future time steps. We propose a UCB-based variant of the full-information algorithm that guarantees a $mathcalO(log T)$-regret w.r.t. an $alpha$regret strategy in $T time steps, matching the $Omega(log(T)$ lower bound
arXiv Detail & Related papers (2020-03-06T20:34:42Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round. We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
arXiv Detail & Related papers (2020-02-13T08:53:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.