Related papers: Thompson Sampling for Bandits with Clustered Arms

Thompson Sampling for Bandits with Clustered Arms

URL: http://arxiv.org/abs/2109.01656v1
Date: Mon, 6 Sep 2021 08:58:01 GMT
Title: Thompson Sampling for Bandits with Clustered Arms
Authors: Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson
Abstract summary: We show, theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost. Our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.
Score: 7.237493755167875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.

Related papers

Fast Online Learning with Gaussian Prior-Driven Hierarchical Unimodal Thompson Sampling [8.624069523932372]
We study a type of Multi-Armed Bandit (MAB) problems in which arms with a Gaussian reward feedback are clustered.<n>Such an arm setting finds applications in many real-world problems, for example, mmWave communications and portfolio management with risky assets.
arXiv Detail & Related papers (2026-02-17T19:46:00Z)
Representative Action Selection for Large Action Space Meta-Bandits [49.386906771833274]
We study the problem of selecting a subset from a large action space shared by a family of bandits.<n>We assume that similar actions tend to have related payoffs, modeled by a Gaussian process.<n>We propose a simple epsilon-net algorithm to select a representative subset.
arXiv Detail & Related papers (2025-05-23T18:08:57Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Feel-Good Thompson Sampling for Contextual Dueling Bandits [49.450050682705026]
We propose a Thompson sampling algorithm, named FGTS.CDB, for linear contextual dueling bandits. At the core of our algorithm is a new Feel-Good exploration term specifically tailored for dueling bandits. Our algorithm achieves nearly minimax-optimal regret, i.e., $tildemathcalO(dsqrt T)$, where $d$ is the model dimension and $T$ is the time horizon.
arXiv Detail & Related papers (2024-04-09T04:45:18Z)
An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB) Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z)
Thompson Sampling with Virtual Helping Agents [0.0]
We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits. We propose two algorithms for the multi-armed bandit problem and provide theoretical bounds on the cumulative regret.
arXiv Detail & Related papers (2022-09-16T23:34:44Z)
Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z)
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits [1.8275108630751844]
We propose a Thompson Sampling algorithm for partially observable contextual multi-armed bandits. We show that the regret of the presented policy scales logarithmically with time and the number of arms, and linearly with the dimension.
arXiv Detail & Related papers (2021-10-23T08:51:49Z)
Batched Thompson Sampling for Multi-Armed Bandits [9.467098519620263]
We study Thompson Sampling algorithms for multi-armed bandits in the batched setting. We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets.
arXiv Detail & Related papers (2021-08-15T20:47:46Z)
Thompson Sampling for Unimodal Bandits [21.514495320038712]
We propose a Thompson Sampling algorithm for emphunimodal bandits, where the expected reward is unimodal over the partially ordered arms. For Gaussian rewards, the regret of our algorithm is $mathcalO(log T)$, which is far better than standard Thompson Sampling algorithms.
arXiv Detail & Related papers (2021-06-15T14:40:34Z)
Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased. We propose a novel approach for dueling bandits based on information-directed sampling (IDS) Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
Distributed Thompson Sampling [22.813570532809212]
We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimize the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. We propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively.
arXiv Detail & Related papers (2020-12-03T09:42:37Z)
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents. We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.