Sleeping Combinatorial Bandits
- URL: http://arxiv.org/abs/2106.01624v1
- Date: Thu, 3 Jun 2021 06:49:44 GMT
- Title: Sleeping Combinatorial Bandits
- Authors: Kumar Abhishek, Ganesh Ghalme, Sujit Gujar, Yadati Narahari
- Abstract summary: We adapt the well-known CUCB algorithm in the sleeping bandits setting and refer to it as CSUCB.
We prove -- under mild conditions -- that the CSUCB algorithm achieves an $O(sqrtT log (T)$ instance-dependent regret guarantee.
Our results are quite general and hold under general environments -- such as non-additive reward functions, volatile arm availability, a variable number of base-arms to be pulled -- arising in practical applications.
- Score: 15.004764451770441
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we study an interesting combination of sleeping and
combinatorial stochastic bandits. In the mixed model studied here, at each
discrete time instant, an arbitrary \emph{availability set} is generated from a
fixed set of \emph{base} arms. An algorithm can select a subset of arms from
the \emph{availability set} (sleeping bandits) and receive the corresponding
reward along with semi-bandit feedback (combinatorial bandits).
We adapt the well-known CUCB algorithm in the sleeping combinatorial bandits
setting and refer to it as \CSUCB. We prove -- under mild smoothness conditions
-- that the \CSUCB\ algorithm achieves an $O(\log (T))$ instance-dependent
regret guarantee. We further prove that (i) when the range of the rewards is
bounded, the regret guarantee of \CSUCB\ algorithm is $O(\sqrt{T \log (T)})$
and (ii) the instance-independent regret is $O(\sqrt[3]{T^2 \log(T)})$ in a
general setting. Our results are quite general and hold under general
environments -- such as non-additive reward functions, volatile arm
availability, a variable number of base-arms to be pulled -- arising in
practical applications. We validate the proven theoretical guarantees through
experiments.
Related papers
- Fast UCB-type algorithms for stochastic bandits with heavy and super
heavy symmetric noise [45.60098988395789]
We propose a new algorithm for constructing UCB-type algorithms for multi-armed bandits.
We show that in the case of symmetric noise in the reward, we can achieve an $O(log TsqrtKTlog T)$ regret bound instead of $Oleft.
arXiv Detail & Related papers (2024-02-10T22:38:21Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Combinatorial Neural Bandits [10.463365653675694]
We consider a contextual bandit problem where in each round a learning agent selects a subset of arms and receives feedback on the selected arms according to their scores.
We propose algorithms: Combinatorial Neural UCB ($textttCN-UCB) and Combinatorial Thompson Sampling ($textttCN-TS$)
arXiv Detail & Related papers (2023-05-31T23:27:58Z) - Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
We study a distributed multi-armed bandit where a client supplies the learner with communication-constrained feedback.
We propose a multi-phase bandit algorithm, $mathttUEtext-UCB++$, which matches this lower bound to a minor additive factor.
arXiv Detail & Related papers (2023-04-25T09:31:20Z) - Optimal Algorithms for Latent Bandits with Cluster Structure [50.44722775727619]
We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem.
We propose LATTICE which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $widetildeO(sqrt(mathsfM+mathsfN)mathsfT.
arXiv Detail & Related papers (2023-01-17T17:49:04Z) - Contextual Combinatorial Volatile Bandits via Gaussian Processes [10.312968200748116]
We consider a contextual bandit problem with a set of available base arms and their contexts.
We propose an algorithm called Optimistic Combinatorial Learning and Optimization with Kernel Upper Confidence Bounds (O'CLOK-UCB)
We experimentally show that both algorithms vastly outperform the previous state-of-the-art UCB-based algorithms in realistic setups.
arXiv Detail & Related papers (2021-10-05T18:02:10Z) - Best-Arm Identification in Correlated Multi-Armed Bandits [9.180690233707645]
We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms.
We show that the total samples obtained by C-LUCB are of the form $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ as opposed to the typical $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ samples required in the independent reward setting
arXiv Detail & Related papers (2021-09-10T15:41:07Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial
Rewards [59.559028338399855]
We consider the problem of sleeping bandits with action sets and adversarial rewards.
In this paper, we provide a new computationally efficient algorithm inspired by EXP3 satisfying a regret of order $O(sqrtT)$.
arXiv Detail & Related papers (2020-04-14T00:41:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.