Delayed Feedback in Generalised Linear Bandits Revisited
- URL: http://arxiv.org/abs/2207.10786v4
- Date: Tue, 11 Apr 2023 11:52:55 GMT
- Title: Delayed Feedback in Generalised Linear Bandits Revisited
- Authors: Benjamin Howson, Ciara Pike-Burke, Sarah Filippi
- Abstract summary: We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner.
We show that a natural adaptation of an optimistic algorithm to the delayed feedback achieves a regret bound where the penalty for the delays is independent of the horizon.
- Score: 5.349852254138085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The stochastic generalised linear bandit is a well-understood model for
sequential decision-making problems, with many algorithms achieving
near-optimal regret guarantees under immediate feedback. However, the stringent
requirement for immediate rewards is unmet in many real-world applications
where the reward is almost always delayed. We study the phenomenon of delayed
rewards in generalised linear bandits in a theoretical manner. We show that a
natural adaptation of an optimistic algorithm to the delayed feedback achieves
a regret bound where the penalty for the delays is independent of the horizon.
This result significantly improves upon existing work, where the best known
regret bound has the delay penalty increasing with the horizon. We verify our
theoretical results through experiments on simulated data.
Related papers
- Biased Dueling Bandits with Stochastic Delayed Feedback [6.167074802065416]
We present two algorithms designed to handle situations involving delay.
Our first algorithm, requiring complete delay distribution information, achieves the optimal regret bound for the dueling bandit problem when there is no delay.
The second algorithm is tailored for situations where the distribution is unknown, but only the expected value of delay is available.
arXiv Detail & Related papers (2024-08-26T19:49:12Z) - Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays [25.757803459592104]
We study the semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints.
This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available.
We present new bandit algorithms to select arms under unrestricted feedback delays based on their merits.
arXiv Detail & Related papers (2024-07-22T07:36:27Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - Posterior Sampling with Delayed Feedback for Reinforcement Learning with
Linear Function Approximation [62.969796245827006]
Delayed-PSVI is an optimistic value-based algorithm that explores the value function space via noise perturbation with posterior sampling.
We show our algorithm achieves $widetildeO(sqrtd3H3 T + d2H2 E[tau]$ worst-case regret in the presence of unknown delays.
We incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI.
arXiv Detail & Related papers (2023-10-29T06:12:43Z) - A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays [24.998122268199797]
We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback.
Our algorithm can tolerate arbitrary excessive delays up to order $T$.
arXiv Detail & Related papers (2023-08-21T12:17:40Z) - Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards [0.4194295877935867]
In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays.
We propose a novel problem formulation called multi-armed bandits with generalized temporally-partitioned rewards.
We derive a lower bound on the performance of any uniformly efficient algorithm for the considered problem.
arXiv Detail & Related papers (2023-03-01T16:22:22Z) - Reward Imputation with Sketching for Contextual Batched Bandits [48.80803376405073]
Contextual batched bandit (CBB) is a setting where a batch of rewards is observed from the environment at the end of each episode.
Existing approaches for CBB often ignore the rewards of the non-executed actions, leading to underutilization of feedback information.
We propose Sketched Policy Updating with Imputed Rewards (SPUIR) that completes the unobserved rewards using sketching.
arXiv Detail & Related papers (2022-10-13T04:26:06Z) - Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When
Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB)
This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull.
We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z) - Nonstochastic Bandits and Experts with Arm-Dependent Delays [17.272515865592542]
We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms.
Our analyses hinge on a novel bound on the drift, measuring how much better an algorithm can perform when given a look-ahead of one round.
arXiv Detail & Related papers (2021-11-02T13:36:11Z) - Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.
We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting.
Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z) - Stochastic bandits with arm-dependent delays [102.63128271054741]
We propose a simple but efficient UCB-based algorithm called the PatientBandits.
We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
arXiv Detail & Related papers (2020-06-18T12:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.