Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
- URL: http://arxiv.org/abs/2106.02436v1
- Date: Fri, 4 Jun 2021 12:26:06 GMT
- Title: Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
- Authors: Tal Lancewicki, Shahar Segal, Tomer Koren, Yishay Mansour
- Abstract summary: We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.
We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting.
Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
- Score: 54.25616645675032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the stochastic Multi-Armed Bandit (MAB) problem with random delays
in the feedback received by the algorithm. We consider two settings: the
reward-dependent delay setting, where realized delays may depend on the
stochastic rewards, and the reward-independent delay setting. Our main
contribution is algorithms that achieve near-optimal regret in each of the
settings, with an additional additive dependence on the quantiles of the delay
distribution. Our results do not make any assumptions on the delay
distributions: in particular, we do not assume they come from any parametric
family of distributions and allow for unbounded support and expectation; we
further allow for infinite delays where the algorithm might occasionally not
observe any feedback.
Related papers
- Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays [25.757803459592104]
We study the semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints.
This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available.
We present new bandit algorithms to select arms under unrestricted feedback delays based on their merits.
arXiv Detail & Related papers (2024-07-22T07:36:27Z) - Tree Search-Based Policy Optimization under Stochastic Execution Delay [46.849634120584646]
Delayed execution MDPs are a new formalism addressing random delays without resorting to state augmentation.
We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies.
We devise DEZ, a model-based algorithm that optimize over the class of Markov policies.
arXiv Detail & Related papers (2024-04-08T12:19:04Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays [24.998122268199797]
We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback.
Our algorithm can tolerate arbitrary excessive delays up to order $T$.
arXiv Detail & Related papers (2023-08-21T12:17:40Z) - Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit
Feedback [39.12903814606534]
This paper investigates the problem of multiarmed bandits with submodular (in expectation) rewards and full-bandit delayed feedback.
The delayed feedback is composed of components of rewards from past actions, with unknown division among the sub-components.
The considered algorithm is demonstrated to outperform other full-bandit approaches with delayed composite anonymous feedback.
arXiv Detail & Related papers (2023-03-23T18:38:33Z) - A Reduction-based Framework for Sequential Decision Making with Delayed
Feedback [53.79893086002961]
We study delayed feedback in general multi-agent sequential decision making.
We propose a novel reduction-based framework, which turns any multi-batched algorithm for sequential decision making with instantaneous feedback into a sample-efficient algorithm.
arXiv Detail & Related papers (2023-02-03T01:16:09Z) - Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When
Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB)
This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull.
We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z) - Thompson Sampling with Unrestricted Delays [18.059421254087976]
We investigate properties of Thompson Sampling in the multi-armed bandit problem with delayed feedback.
Our bounds are qualitatively comparable to the best available bounds derived via ad-hoc algorithms.
In extensive simulation experiments, we find that Thompson Sampling outperforms a number of alternative proposals.
arXiv Detail & Related papers (2022-02-24T23:56:36Z) - Stochastic bandits with arm-dependent delays [102.63128271054741]
We propose a simple but efficient UCB-based algorithm called the PatientBandits.
We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
arXiv Detail & Related papers (2020-06-18T12:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.