Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit
Feedback
- URL: http://arxiv.org/abs/2303.13604v1
- Date: Thu, 23 Mar 2023 18:38:33 GMT
- Title: Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit
Feedback
- Authors: Mohammad Pedramfar, Vaneet Aggarwal
- Abstract summary: This paper investigates the problem of multiarmed bandits with submodular (in expectation) rewards and full-bandit delayed feedback.
The delayed feedback is composed of components of rewards from past actions, with unknown division among the sub-components.
The considered algorithm is demonstrated to outperform other full-bandit approaches with delayed composite anonymous feedback.
- Score: 39.12903814606534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the problem of combinatorial multiarmed bandits with
stochastic submodular (in expectation) rewards and full-bandit delayed
feedback, where the delayed feedback is assumed to be composite and anonymous.
In other words, the delayed feedback is composed of components of rewards from
past actions, with unknown division among the sub-components. Three models of
delayed feedback: bounded adversarial, stochastic independent, and stochastic
conditionally independent are studied, and regret bounds are derived for each
of the delay models. Ignoring the problem dependent parameters, we show that
regret bound for all the delay models is $\tilde{O}(T^{2/3} + T^{1/3} \nu)$ for
time horizon $T$, where $\nu$ is a delay parameter defined differently in the
three cases, thus demonstrating an additive term in regret with delay in all
the three delay models. The considered algorithm is demonstrated to outperform
other full-bandit approaches with delayed composite anonymous feedback.
Related papers
- Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - Improved Regret for Bandit Convex Optimization with Delayed Feedback [50.46856739179311]
bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under a delay.
We develop a novel algorithm, and prove that it enjoys a regret bound of $O(sqrtnT3/4+sqrtdT)$ in general.
We show that the proposed algorithm can improve the regret bound to $O((nT)2/3log/3T+dlog T)$ for strongly convex functions.
arXiv Detail & Related papers (2024-02-14T13:08:26Z) - Adversarial Bandits with Multi-User Delayed Feedback: Theory and
Application [17.64363983613468]
We formulate an adversarial MAB problem with multi-user delayed feedback and design a modified EXP3 algorithm MUD-EXP3.
In this paper, we consider that the delayed feedback results are from multiple users and are unrestricted on internal distribution.
arXiv Detail & Related papers (2023-10-17T12:08:15Z) - A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays [24.998122268199797]
We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback.
Our algorithm can tolerate arbitrary excessive delays up to order $T$.
arXiv Detail & Related papers (2023-08-21T12:17:40Z) - A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial
Semi-Bandits, Linear Bandits, and MDPs [18.199326045904996]
We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback.
Our novel regret decomposition shows that FTRL remains stable across multiple rounds under mild assumptions on the Hessian of the regularizer.
arXiv Detail & Related papers (2023-05-15T13:21:50Z) - Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms.
We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z) - Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.
We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting.
Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z) - Stochastic bandits with arm-dependent delays [102.63128271054741]
We propose a simple but efficient UCB-based algorithm called the PatientBandits.
We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
arXiv Detail & Related papers (2020-06-18T12:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.