First- and Second-Order Bounds for Adversarial Linear Contextual Bandits
- URL: http://arxiv.org/abs/2305.00832v3
- Date: Wed, 24 May 2023 12:27:03 GMT
- Title: First- and Second-Order Bounds for Adversarial Linear Contextual Bandits
- Authors: Julia Olkhovskaya, Jack Mayo, Tim van Erven, Gergely Neu and Chen-Yu
Wei
- Abstract summary: We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of $K$ arms to change over time without restriction.
Since $V_T$ or $L_T*$ may be significantly smaller than $T$, these improve over the worst-case regret whenever the environment is relatively benign.
- Score: 22.367921675238318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the adversarial linear contextual bandit setting, which allows
for the loss functions associated with each of $K$ arms to change over time
without restriction. Assuming the $d$-dimensional contexts are drawn from a
fixed known distribution, the worst-case expected regret over the course of $T$
rounds is known to scale as $\tilde O(\sqrt{Kd T})$. Under the additional
assumption that the density of the contexts is log-concave, we obtain a
second-order bound of order $\tilde O(K\sqrt{d V_T})$ in terms of the
cumulative second moment of the learner's losses $V_T$, and a closely related
first-order bound of order $\tilde O(K\sqrt{d L_T^*})$ in terms of the
cumulative loss of the best policy $L_T^*$. Since $V_T$ or $L_T^*$ may be
significantly smaller than $T$, these improve over the worst-case regret
whenever the environment is relatively benign. Our results are obtained using a
truncated version of the continuous exponential weights algorithm over the
probability simplex, which we analyse by exploiting a novel connection to the
linear bandit setting without contexts.
Related papers
- Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits.
Our algorithms deliver near-optimal regret bounds in both the adversarial and adversarial regimes.
arXiv Detail & Related papers (2023-12-24T08:27:30Z) - Context-lumpable stochastic bandits [49.024050919419366]
We consider a contextual bandit problem with $S$ contexts and $K$ actions.
We give an algorithm that outputs an $epsilon$-optimal policy after using at most $widetilde O(r (S +K )/epsilon2)$ samples.
In the regret setting, we give an algorithm whose cumulative regret up to time $T$ is bounded by $widetilde O(sqrtr3(S+K)T)$.
arXiv Detail & Related papers (2023-06-22T17:20:30Z) - Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$.
In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret.
We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Minimal Expected Regret in Linear Quadratic Control [79.81807680370677]
We devise an online learning algorithm and provide guarantees on its expected regret.
This regret at time $T$ is upper bounded (i) by $widetildeO((d_u+d_x)sqrtd_xT)$ when $A$ and $B$ are unknown.
arXiv Detail & Related papers (2021-09-29T14:07:21Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed
Payoffs [35.988644745703645]
We analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+epsilon$ moments.
We propose two novel algorithms which enjoy a sublinear regret bound of $widetildeO(dfrac12Tfrac11+epsilon)$.
arXiv Detail & Related papers (2020-04-28T13:01:38Z) - Efficient and Robust Algorithms for Adversarial Linear Contextual
Bandits [13.32560004325655]
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem.
Under the assumption that the $d$-dimensional contexts are generated i.i.d.at random, we develop efficient algorithms based on the classic Exp3 algorithm.
arXiv Detail & Related papers (2020-02-01T22:49:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.