Related papers: Distributed Thompson Sampling

Distributed Thompson Sampling

URL: http://arxiv.org/abs/2012.01789v1
Date: Thu, 3 Dec 2020 09:42:37 GMT
Title: Distributed Thompson Sampling
Authors: Jing Dong, Tan Li, Shaolei Ren, Linqi Song
Abstract summary: We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimize the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. We propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively.
Score: 22.813570532809212
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.

Related papers

Representative Action Selection for Large Action Space Meta-Bandits [49.386906771833274]
We study the problem of selecting a subset from a large action space shared by a family of bandits.<n>We assume that similar actions tend to have related payoffs, modeled by a Gaussian process.<n>We propose a simple epsilon-net algorithm to select a representative subset.
arXiv Detail & Related papers (2025-05-23T18:08:57Z)
Thompson Sampling with Virtual Helping Agents [0.0]
We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits. We propose two algorithms for the multi-armed bandit problem and provide theoretical bounds on the cumulative regret.
arXiv Detail & Related papers (2022-09-16T23:34:44Z)
Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z)
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family. We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z)
Thompson Sampling on Asymmetric $\alpha$-Stable Bandits [0.0]
Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution. Thompson Sampling is a common method for solving multi-armed bandit problem.
arXiv Detail & Related papers (2022-03-19T01:55:08Z)
Thompson Sampling with Unrestricted Delays [18.059421254087976]
We investigate properties of Thompson Sampling in the multi-armed bandit problem with delayed feedback. Our bounds are qualitatively comparable to the best available bounds derived via ad-hoc algorithms. In extensive simulation experiments, we find that Thompson Sampling outperforms a number of alternative proposals.
arXiv Detail & Related papers (2022-02-24T23:56:36Z)
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization [50.725191156128645]
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments. We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
arXiv Detail & Related papers (2021-12-15T22:11:58Z)
Thompson Sampling for Bandits with Clustered Arms [7.237493755167875]
We show, theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost. Our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.
arXiv Detail & Related papers (2021-09-06T08:58:01Z)
Batched Thompson Sampling for Multi-Armed Bandits [9.467098519620263]
We study Thompson Sampling algorithms for multi-armed bandits in the batched setting. We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets.
arXiv Detail & Related papers (2021-08-15T20:47:46Z)
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring. We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z)
MOTS: Minimax Optimal Thompson Sampling [89.2370817955411]
It has remained an open problem whether Thompson sampling can match the minimax lower bound $Omega(sqrtKT)$ for $K$-armed bandit problems. We propose a variant of Thompson sampling called MOTS that adaptively clips the sampling instance of the chosen arm at each time step. We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound $O(sqrtKT)$ for finite time horizon $T$, as well as the optimal regret bound for Gaussian rewards when $T$ approaches infinity.
arXiv Detail & Related papers (2020-03-03T21:24:39Z)
On Thompson Sampling with Langevin Algorithms [106.78254564840844]
Thompson sampling for multi-armed bandit problems enjoys favorable performance in both theory and practice. It suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue.
arXiv Detail & Related papers (2020-02-23T22:35:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.