Stochastic bandits with arm-dependent delays
- URL: http://arxiv.org/abs/2006.10459v1
- Date: Thu, 18 Jun 2020 12:13:58 GMT
- Title: Stochastic bandits with arm-dependent delays
- Authors: Anne Gael Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko
- Abstract summary: We propose a simple but efficient UCB-based algorithm called the PatientBandits.
We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
- Score: 102.63128271054741
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Significant work has been recently dedicated to the stochastic delayed bandit
setting because of its relevance in applications. The applicability of existing
algorithms is however restricted by the fact that strong assumptions are often
made on the delay distributions, such as full observability, restrictive shape
constraints, or uniformity over arms. In this work, we weaken them
significantly and only assume that there is a bound on the tail of the delay.
In particular, we cover the important case where the delay distributions vary
across arms, and the case where the delays are heavy-tailed. Addressing these
difficulties, we propose a simple but efficient UCB-based algorithm called the
PatientBandits. We provide both problems-dependent and problems-independent
bounds on the regret as well as performance lower bounds.
Related papers
- Biased Dueling Bandits with Stochastic Delayed Feedback [6.167074802065416]
We present two algorithms designed to handle situations involving delay.
Our first algorithm, requiring complete delay distribution information, achieves the optimal regret bound for the dueling bandit problem when there is no delay.
The second algorithm is tailored for situations where the distribution is unknown, but only the expected value of delay is available.
arXiv Detail & Related papers (2024-08-26T19:49:12Z) - Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays [25.757803459592104]
We study the semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints.
This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available.
We present new bandit algorithms to select arms under unrestricted feedback delays based on their merits.
arXiv Detail & Related papers (2024-07-22T07:36:27Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related
Rewards [7.0997346625024]
We formalize a non-stationary and delayed semi-bandit problem with causally related rewards.
We develop a policy that learns the structural dependencies from delayed feedback and utilizes that to optimize the decision-making.
We evaluate our method via numerical analysis using synthetic and real-world datasets to detect the regions that contribute the most to the spread of Covid-19 in Italy.
arXiv Detail & Related papers (2023-07-18T09:22:33Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Nonstochastic Bandits and Experts with Arm-Dependent Delays [17.272515865592542]
We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms.
Our analyses hinge on a novel bound on the drift, measuring how much better an algorithm can perform when given a look-ahead of one round.
arXiv Detail & Related papers (2021-11-02T13:36:11Z) - Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits [66.02233330016435]
We provide problem-dependent guarantees on both the regret and the asked feedback.
We present a new algorithm called BuFALU for which we derive problem-dependent regret and cumulative feedback bounds.
arXiv Detail & Related papers (2021-10-12T03:24:57Z) - Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.
We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting.
Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.