No Discounted-Regret Learning in Adversarial Bandits with Delays
- URL: http://arxiv.org/abs/2103.04550v1
- Date: Mon, 8 Mar 2021 05:15:31 GMT
- Title: No Discounted-Regret Learning in Adversarial Bandits with Delays
- Authors: Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet
- Abstract summary: We show that the expected discounted ergodic distribution of play converges to the set of coarse correlated equilibrium (CCE) if the algorithms have "no discounted-regret"
For a zero-sum game, we show that no discounted-regret is sufficient for the discounted ergodic average of play to converge to the set of Nash equilibria.
- Score: 40.670563413892154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider a player that in each round $t$ out of $T$ rounds chooses an action
and observes the incurred cost after a delay of $d_{t}$ rounds. The cost
functions and the delay sequence are chosen by an adversary. We show that even
if the players' algorithms lose their "no regret" property due to too large
delays, the expected discounted ergodic distribution of play converges to the
set of coarse correlated equilibrium (CCE) if the algorithms have "no
discounted-regret". For a zero-sum game, we show that no discounted-regret is
sufficient for the discounted ergodic average of play to converge to the set of
Nash equilibria. We prove that the FKM algorithm with $n$ dimensions achieves a
regret of
$O\left(nT^{\frac{3}{4}}+\sqrt{n}T^{\frac{1}{3}}D^{\frac{1}{3}}\right)$ and the
EXP3 algorithm with $K$ arms achieves a regret of $O\left(\sqrt{\ln
K\left(KT+D\right)}\right)$ even when $D=\sum_{t=1}^{T}d_{t}$ and $T$ are
unknown. These bounds use a novel doubling trick that provably retains the
regret bound for when $D$ and $T$ are known. Using these bounds, we show that
EXP3 and FKM have no discounted-regret even for $d_{t}=O\left(t\log t\right)$.
Therefore, the CCE of a finite or convex unknown game can be approximated even
when only delayed bandit feedback is available via simulation.
Related papers
- Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round.
We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z) - Improved Regret for Bandit Convex Optimization with Delayed Feedback [50.46856739179311]
bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under a delay.
We develop a novel algorithm, and prove that it enjoys a regret bound of $O(sqrtnT3/4+sqrtdT)$ in general.
We show that the proposed algorithm can improve the regret bound to $O((nT)2/3log/3T+dlog T)$ for strongly convex functions.
arXiv Detail & Related papers (2024-02-14T13:08:26Z) - Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits [3.5955736977697073]
In the single-pass setting with $K$ arms and $T$ trials, a regret lower bound of $Omega(T2/3)$ has been proved for any algorithm with $o(K)$ memory.
In this paper, we improve the regret lower bound to $Omega(K/3log/3(T))$ for algorithms with $o(K)$ memory.
We show that the proposed algorithms consistently outperform the benchmark uniform exploration algorithm by a large margin, and on occasion, reduce the regret by up to 70%.
arXiv Detail & Related papers (2023-06-03T22:41:44Z) - A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback [25.68113242132723]
We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback.
We simultaneously achieves a near-optimal adversarial regret guarantee in the setting with fixed delays.
We also present an extension of the algorithm to the case of arbitrary delays.
arXiv Detail & Related papers (2022-06-29T20:49:45Z) - Better Best of Both Worlds Bounds for Bandits with Switching Costs [37.71741656687868]
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021.
We introduce a surprisingly simple and effective that simultaneously achieves minimax optimal regret bound of $mathcalO(T2/3)$ in the oblivious adversarial setting.
arXiv Detail & Related papers (2022-06-07T08:22:56Z) - Bounded Memory Adversarial Bandits with Composite Anonymous Delayed
Feedback [10.179145161000315]
We study the adversarial bandit problem with composite anonymous delayed feedback.
In this setting, losses of an action are split into $d$ components, spreading over consecutive rounds after the action is chosen.
We show non-oblivious setting incurs $Omega(T)$ pseudo regret even when the loss sequence is bounded memory.
arXiv Detail & Related papers (2022-04-27T08:32:35Z) - Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer
Games [121.50979258049135]
We show that when all players follow our dynamics with a emphtime-invariant learning rate, the emphsecond-order path lengths of the dynamics up to time $T$ are bounded by $O(log T)$.
Our proposed learning dynamics combine in a novel way emphoptimistic regularized learning with the use of emphself-concordant barriers.
arXiv Detail & Related papers (2022-04-25T03:20:16Z) - Adversarial Dueling Bandits [85.14061196945599]
We introduce the problem of regret in Adversarial Dueling Bandits.
The learner has to repeatedly choose a pair of items and observe only a relative binary win-loss' feedback for this pair.
Our main result is an algorithm whose $T$-round regret compared to the emphBorda-winner from a set of $K$ items.
arXiv Detail & Related papers (2020-10-27T19:09:08Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.