Nonstochastic Bandits and Experts with Arm-Dependent Delays
- URL: http://arxiv.org/abs/2111.01589v1
- Date: Tue, 2 Nov 2021 13:36:11 GMT
- Title: Nonstochastic Bandits and Experts with Arm-Dependent Delays
- Authors: Dirk van der Hoeven and Nicol\`o Cesa-Bianchi
- Abstract summary: We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms.
Our analyses hinge on a novel bound on the drift, measuring how much better an algorithm can perform when given a look-ahead of one round.
- Score: 17.272515865592542
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study nonstochastic bandits and experts in a delayed setting where delays
depend on both time and arms. While the setting in which delays only depend on
time has been extensively studied, the arm-dependent delay setting better
captures real-world applications at the cost of introducing new technical
challenges. In the full information (experts) setting, we design an algorithm
with a first-order regret bound that reveals an interesting trade-off between
delays and losses. We prove a similar first-order regret bound also for the
bandit setting, when the learner is allowed to observe how many losses are
missing. These are the first bounds in the delayed setting that depend on the
losses and delays of the best arm only. When in the bandit setting no
information other than the losses is observed, we still manage to prove a
regret bound through a modification to the algorithm of Zimmert and Seldin
(2020). Our analyses hinge on a novel bound on the drift, measuring how much
better an algorithm can perform when given a look-ahead of one round.
Related papers
- Biased Dueling Bandits with Stochastic Delayed Feedback [6.167074802065416]
We present two algorithms designed to handle situations involving delay.
Our first algorithm, requiring complete delay distribution information, achieves the optimal regret bound for the dueling bandit problem when there is no delay.
The second algorithm is tailored for situations where the distribution is unknown, but only the expected value of delay is available.
arXiv Detail & Related papers (2024-08-26T19:49:12Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays [24.998122268199797]
We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback.
Our algorithm can tolerate arbitrary excessive delays up to order $T$.
arXiv Detail & Related papers (2023-08-21T12:17:40Z) - Delayed Feedback in Generalised Linear Bandits Revisited [5.349852254138085]
We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner.
We show that a natural adaptation of an optimistic algorithm to the delayed feedback achieves a regret bound where the penalty for the delays is independent of the horizon.
arXiv Detail & Related papers (2022-07-21T23:35:01Z) - Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When
Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB)
This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull.
We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z) - Nonstochastic Bandits with Composite Anonymous Feedback [41.38921728211769]
We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player.
The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions.
arXiv Detail & Related papers (2021-12-06T08:44:04Z) - Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.
We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting.
Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z) - Online Model Selection: a Rested Bandit Formulation [49.69377391589057]
We introduce and analyze a best arm identification problem in the rested bandit setting.
We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game.
Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function.
arXiv Detail & Related papers (2020-12-07T08:23:08Z) - Adapting to Delays and Data in Adversarial Multi-Armed Bandits [7.310043452300736]
We analyze variants of the Exp3 algorithm that tune their step-size using only information available at the time of the decisions.
We obtain regret guarantees that adapt to the observed (rather than the worst-case) sequences of delays and/or losses.
arXiv Detail & Related papers (2020-10-12T20:53:52Z) - Stochastic bandits with arm-dependent delays [102.63128271054741]
We propose a simple but efficient UCB-based algorithm called the PatientBandits.
We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.
arXiv Detail & Related papers (2020-06-18T12:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.