Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial
Rewards
- URL: http://arxiv.org/abs/2004.06248v2
- Date: Sat, 8 Aug 2020 12:45:26 GMT
- Title: Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial
Rewards
- Authors: Aadirupa Saha, Pierre Gaillard, Michal Valko
- Abstract summary: We consider the problem of sleeping bandits with action sets and adversarial rewards.
In this paper, we provide a new computationally efficient algorithm inspired by EXP3 satisfying a regret of order $O(sqrtT)$.
- Score: 59.559028338399855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider the problem of sleeping bandits with stochastic
action sets and adversarial rewards. In this setting, in contrast to most work
in bandits, the actions may not be available at all times. For instance, some
products might be out of stock in item recommendation. The best existing
efficient (i.e., polynomial-time) algorithms for this problem only guarantee an
$O(T^{2/3})$ upper-bound on the regret. Yet, inefficient algorithms based on
EXP4 can achieve $O(\sqrt{T})$. In this paper, we provide a new computationally
efficient algorithm inspired by EXP3 satisfying a regret of order $O(\sqrt{T})$
when the availabilities of each action $i \in \cA$ are independent. We then
study the most general version of the problem where at each round available
sets are generated from some unknown arbitrary distribution (i.e., without the
independence assumption) and propose an efficient algorithm with $O(\sqrt {2^K
T})$ regret guarantee. Our theoretical results are corroborated with
experimental evaluations.
Related papers
- Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$.
We first design novel efficient algorithms to privately release pair-wise cosine similarities.
We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed
Rewards [40.99322897009357]
We propose two novel algorithms based on truncation and mean of medians.
Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches.
Our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $epsilon=1$.
arXiv Detail & Related papers (2023-10-28T13:01:10Z) - An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low
Regret [7.059472280274009]
We propose a new efficient algorithm with lower regret than even previous inefficient ones.
For $N$ agents, $K$ arms, and $T$ rounds, our approach has a regret bound of $tildeO(sqrtNKT + NK)$.
We also complement our efficient algorithm with an inefficient approach with $tildeO(sqrtKT + N2K)$ regret.
arXiv Detail & Related papers (2022-09-23T19:15:43Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary
Dueling Bandits [27.279654173896372]
We study the problem of emphdynamic regret minimization in $K$-armed Dueling Bandits under non-stationary or time varying preferences.
This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary win-loss' feedback for this pair.
arXiv Detail & Related papers (2021-11-06T16:46:55Z) - No Regrets for Learning the Prior in Bandits [30.478188057004175]
$tt AdaTS$ is a Thompson sampling algorithm that adapts sequentially to bandit tasks.
$tt AdaTS$ is a fully-Bayesian algorithm that can be implemented efficiently in several classes of bandit problems.
arXiv Detail & Related papers (2021-07-13T15:51:32Z) - Sleeping Combinatorial Bandits [15.004764451770441]
We adapt the well-known CUCB algorithm in the sleeping bandits setting and refer to it as CSUCB.
We prove -- under mild conditions -- that the CSUCB algorithm achieves an $O(sqrtT log (T)$ instance-dependent regret guarantee.
Our results are quite general and hold under general environments -- such as non-additive reward functions, volatile arm availability, a variable number of base-arms to be pulled -- arising in practical applications.
arXiv Detail & Related papers (2021-06-03T06:49:44Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.