Thompson Sampling for Bandits with Clustered Arms
- URL: http://arxiv.org/abs/2109.01656v1
- Date: Mon, 6 Sep 2021 08:58:01 GMT
- Title: Thompson Sampling for Bandits with Clustered Arms
- Authors: Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson
- Abstract summary: We show, theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost.
Our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.
- Score: 7.237493755167875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose algorithms based on a multi-level Thompson sampling scheme, for
the stochastic multi-armed bandit and its contextual variant with linear
expected rewards, in the setting where arms are clustered. We show, both
theoretically and empirically, how exploiting a given cluster structure can
significantly improve the regret and computational cost compared to using
standard Thompson sampling. In the case of the stochastic multi-armed bandit we
give upper bounds on the expected cumulative regret showing how it depends on
the quality of the clustering. Finally, we perform an empirical evaluation
showing that our algorithms perform well compared to previously proposed
algorithms for bandits with clustered arms.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Feel-Good Thompson Sampling for Contextual Dueling Bandits [49.450050682705026]
We propose a Thompson sampling algorithm, named FGTS.CDB, for linear contextual dueling bandits.
At the core of our algorithm is a new Feel-Good exploration term specifically tailored for dueling bandits.
Our algorithm achieves nearly minimax-optimal regret, i.e., $tildemathcalO(dsqrt T)$, where $d$ is the model dimension and $T$ is the time horizon.
arXiv Detail & Related papers (2024-04-09T04:45:18Z) - An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Thompson Sampling with Virtual Helping Agents [0.0]
We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits.
We propose two algorithms for the multi-armed bandit problem and provide theoretical bounds on the cumulative regret.
arXiv Detail & Related papers (2022-09-16T23:34:44Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - Analysis of Thompson Sampling for Partially Observable Contextual
Multi-Armed Bandits [1.8275108630751844]
We propose a Thompson Sampling algorithm for partially observable contextual multi-armed bandits.
We show that the regret of the presented policy scales logarithmically with time and the number of arms, and linearly with the dimension.
arXiv Detail & Related papers (2021-10-23T08:51:49Z) - Batched Thompson Sampling for Multi-Armed Bandits [9.467098519620263]
We study Thompson Sampling algorithms for multi-armed bandits in the batched setting.
We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets.
arXiv Detail & Related papers (2021-08-15T20:47:46Z) - Thompson Sampling for Unimodal Bandits [21.514495320038712]
We propose a Thompson Sampling algorithm for emphunimodal bandits, where the expected reward is unimodal over the partially ordered arms.
For Gaussian rewards, the regret of our algorithm is $mathcalO(log T)$, which is far better than standard Thompson Sampling algorithms.
arXiv Detail & Related papers (2021-06-15T14:40:34Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - Distributed Thompson Sampling [22.813570532809212]
We study a cooperative multi-agent multi-armed bandits with M agents and K arms.
The goal of the agents is to minimize the cumulative regret.
We adapt a traditional Thompson Sampling algoirthm under the distributed setting.
We propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively.
arXiv Detail & Related papers (2020-12-03T09:42:37Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.