Thompson Sampling with Unrestricted Delays
- URL: http://arxiv.org/abs/2202.12431v1
- Date: Thu, 24 Feb 2022 23:56:36 GMT
- Title: Thompson Sampling with Unrestricted Delays
- Authors: Han Wu and Stefan Wager
- Abstract summary: We investigate properties of Thompson Sampling in the multi-armed bandit problem with delayed feedback.
Our bounds are qualitatively comparable to the best available bounds derived via ad-hoc algorithms.
In extensive simulation experiments, we find that Thompson Sampling outperforms a number of alternative proposals.
- Score: 18.059421254087976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate properties of Thompson Sampling in the stochastic multi-armed
bandit problem with delayed feedback. In a setting with i.i.d delays, we
establish to our knowledge the first regret bounds for Thompson Sampling with
arbitrary delay distributions, including ones with unbounded expectation. Our
bounds are qualitatively comparable to the best available bounds derived via
ad-hoc algorithms, and only depend on delays via selected quantiles of the
delay distributions. Furthermore, in extensive simulation experiments, we find
that Thompson Sampling outperforms a number of alternative proposals, including
methods specifically designed for settings with delayed feedback.
Related papers
- Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - A Reduction-based Framework for Sequential Decision Making with Delayed
Feedback [53.79893086002961]
We study delayed feedback in general multi-agent sequential decision making.
We propose a novel reduction-based framework, which turns any multi-batched algorithm for sequential decision making with instantaneous feedback into a sample-efficient algorithm.
arXiv Detail & Related papers (2023-02-03T01:16:09Z) - Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits [17.11922027966447]
This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits.
For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
arXiv Detail & Related papers (2022-11-11T02:23:39Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed
Bandits [0.0]
We show that Thompson sampling can maintain its performance even if it receives delayed feedback in batches.
We propose an adaptive scheme that reduces the number of batches to $Theta(log T)$ while maintaining the same performance.
arXiv Detail & Related papers (2021-10-01T01:28:40Z) - Batched Thompson Sampling for Multi-Armed Bandits [9.467098519620263]
We study Thompson Sampling algorithms for multi-armed bandits in the batched setting.
We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets.
arXiv Detail & Related papers (2021-08-15T20:47:46Z) - Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.
We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting.
Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z) - Distributed Thompson Sampling [22.813570532809212]
We study a cooperative multi-agent multi-armed bandits with M agents and K arms.
The goal of the agents is to minimize the cumulative regret.
We adapt a traditional Thompson Sampling algoirthm under the distributed setting.
We propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively.
arXiv Detail & Related papers (2020-12-03T09:42:37Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z) - On Thompson Sampling with Langevin Algorithms [106.78254564840844]
Thompson sampling for multi-armed bandit problems enjoys favorable performance in both theory and practice.
It suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration.
We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue.
arXiv Detail & Related papers (2020-02-23T22:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.