Related papers: Delayed Feedback in Generalised Linear Bandits Revisited

Delayed Feedback in Generalised Linear Bandits Revisited

URL: http://arxiv.org/abs/2207.10786v4
Date: Tue, 11 Apr 2023 11:52:55 GMT
Title: Delayed Feedback in Generalised Linear Bandits Revisited
Authors: Benjamin Howson, Ciara Pike-Burke, Sarah Filippi
Abstract summary: We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner. We show that a natural adaptation of an optimistic algorithm to the delayed feedback achieves a regret bound where the penalty for the delays is independent of the horizon.
Score: 5.349852254138085
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for immediate rewards is unmet in many real-world applications where the reward is almost always delayed. We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner. We show that a natural adaptation of an optimistic algorithm to the delayed feedback achieves a regret bound where the penalty for the delays is independent of the horizon. This result significantly improves upon existing work, where the best known regret bound has the delay penalty increasing with the horizon. We verify our theoretical results through experiments on simulated data.

Related papers

Neural Contextual Bandits Under Delayed Feedback Constraints [3.823356975862005]
This paper presents a new algorithm for neural contextual bandits (CBs) that addresses the challenge of delayed reward feedback. The proposed algorithm, called Delayed NeuralUCB, uses an upper confidence bound (UCB)-based exploration strategy. Numerical experiments on real-world datasets, such as MNIST and Mushroom, demonstrate that the proposed algorithms effectively manage varying delays.
arXiv Detail & Related papers (2025-04-16T13:47:25Z)
Biased Dueling Bandits with Stochastic Delayed Feedback [6.167074802065416]
We present two algorithms designed to handle situations involving delay. Our first algorithm, requiring complete delay distribution information, achieves the optimal regret bound for the dueling bandit problem when there is no delay. The second algorithm is tailored for situations where the distribution is unknown, but only the expected value of delay is available.
arXiv Detail & Related papers (2024-08-26T19:49:12Z)
Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays [25.757803459592104]
We study the semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available. We present new bandit algorithms to select arms under unrestricted feedback delays based on their merits.
arXiv Detail & Related papers (2024-07-22T07:36:27Z)
Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling. Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z)
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation [62.969796245827006]
Delayed-PSVI is an optimistic value-based algorithm that explores the value function space via noise perturbation with posterior sampling. We show our algorithm achieves $widetildeO(sqrtd3H3 T + d2H2 E[tau]$ worst-case regret in the presence of unknown delays. We incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI.
arXiv Detail & Related papers (2023-10-29T06:12:43Z)
A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays [24.998122268199797]
We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback. Our algorithm can tolerate arbitrary excessive delays up to order $T$.
arXiv Detail & Related papers (2023-08-21T12:17:40Z)
Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards [0.4194295877935867]
In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays. We propose a novel problem formulation called multi-armed bandits with generalized temporally-partitioned rewards. We derive a lower bound on the performance of any uniformly efficient algorithm for the considered problem.
arXiv Detail & Related papers (2023-03-01T16:22:22Z)
Reward Imputation with Sketching for Contextual Batched Bandits [48.80803376405073]
Contextual batched bandit (CBB) is a setting where a batch of rewards is observed from the environment at the end of each episode. Existing approaches for CBB often ignore the rewards of the non-executed actions, leading to underutilization of feedback information. We propose Sketched Policy Updating with Imputed Rewards (SPUIR) that completes the unobserved rewards using sketching.
arXiv Detail & Related papers (2022-10-13T04:26:06Z)
Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB) This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull. We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z)
Nonstochastic Bandits and Experts with Arm-Dependent Delays [17.272515865592542]
We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms. Our analyses hinge on a novel bound on the drift, measuring how much better an algorithm can perform when given a look-ahead of one round.
arXiv Detail & Related papers (2021-11-02T13:36:11Z)
Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z)
Stochastic bandits with arm-dependent delays [102.63128271054741]
We propose a simple but efficient UCB-based algorithm called the PatientBandits. We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
arXiv Detail & Related papers (2020-06-18T12:13:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.