Lipschitz Bandits with Stochastic Delayed Feedback
- URL: http://arxiv.org/abs/2510.00309v1
- Date: Tue, 30 Sep 2025 22:07:17 GMT
- Title: Lipschitz Bandits with Stochastic Delayed Feedback
- Authors: Zhongxuan Liu, Yue Kang, Thomas C. M. Lee,
- Abstract summary: We introduce a new problem of Lipschitz bandit in the presence of delayed feedback.<n>We design algorithms that attain sublinear regret guarantees in each setting.<n>We present experimental results to demonstrate the efficiency of our algorithms under various delay scenarios.
- Score: 3.0594138391611967
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit in the presence of stochastic delayed feedback, where the rewards are not observed immediately but after a random delay. We consider both bounded and unbounded stochastic delays, and design algorithms that attain sublinear regret guarantees in each setting. For bounded delays, we propose a delay-aware zooming algorithm that retains the optimal performance of the delay-free setting up to an additional term that scales with the maximal delay $\tau_{\max}$. For unbounded delays, we propose a novel phased learning strategy that accumulates reliable feedback over carefully scheduled intervals, and establish a regret lower bound showing that our method is nearly optimal up to logarithmic factors. Finally, we present experimental results to demonstrate the efficiency of our algorithms under various delay scenarios.
Related papers
- Tree Search-Based Policy Optimization under Stochastic Execution Delay [46.849634120584646]
Delayed execution MDPs are a new formalism addressing random delays without resorting to state augmentation.
We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies.
We devise DEZ, a model-based algorithm that optimize over the class of Markov policies.
arXiv Detail & Related papers (2024-04-08T12:19:04Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - Posterior Sampling with Delayed Feedback for Reinforcement Learning with
Linear Function Approximation [62.969796245827006]
Delayed-PSVI is an optimistic value-based algorithm that explores the value function space via noise perturbation with posterior sampling.
We show our algorithm achieves $widetildeO(sqrtd3H3 T + d2H2 E[tau]$ worst-case regret in the presence of unknown delays.
We incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI.
arXiv Detail & Related papers (2023-10-29T06:12:43Z) - A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays [24.998122268199797]
We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback.
Our algorithm can tolerate arbitrary excessive delays up to order $T$.
arXiv Detail & Related papers (2023-08-21T12:17:40Z) - Non-stationary Delayed Online Convex Optimization: From Full-information to Bandit Setting [71.82716109461967]
We propose an algorithm called Mild-OGD for the full-information case, where delayed gradients are available.<n>We show that the dynamic regret of Mild-OGD can be automatically bounded by $O(sqrtbardT(P_T+1))$ under the in-order assumption.<n>We also develop a bandit variant of Mild-OGD for a more challenging case with only delayed loss values.
arXiv Detail & Related papers (2023-05-20T07:54:07Z) - Delayed Feedback in Generalised Linear Bandits Revisited [5.349852254138085]
We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner.
We show that a natural adaptation of an optimistic algorithm to the delayed feedback achieves a regret bound where the penalty for the delays is independent of the horizon.
arXiv Detail & Related papers (2022-07-21T23:35:01Z) - Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.
We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting.
Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z) - Stochastic bandits with arm-dependent delays [102.63128271054741]
We propose a simple but efficient UCB-based algorithm called the PatientBandits.
We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
arXiv Detail & Related papers (2020-06-18T12:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.