Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual
Bandits
- URL: http://arxiv.org/abs/2309.00814v1
- Date: Sat, 2 Sep 2023 03:49:05 GMT
- Title: Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual
Bandits
- Authors: Haolin Liu, Chen-Yu Wei, Julian Zimmert
- Abstract summary: We consider the adversarial linear contextual bandit problem, where the loss vectors are selected fully adversarially and the per-round action set is drawn from a fixed distribution.
Existing methods for this problem either require access to a simulator to generate free i.i.d. contexts, achieve a sub-optimal regret, or are computationally inefficient.
We greatly improve these results by achieving a regret of $widetildeO(sqrtT)$ without a simulator, maintaining computational efficiency when the action set in each round is small.
- Score: 30.337826346496385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the adversarial linear contextual bandit problem, where the loss
vectors are selected fully adversarially and the per-round action set (i.e. the
context) is drawn from a fixed distribution. Existing methods for this problem
either require access to a simulator to generate free i.i.d. contexts, achieve
a sub-optimal regret no better than $\widetilde{O}(T^{\frac{5}{6}})$, or are
computationally inefficient. We greatly improve these results by achieving a
regret of $\widetilde{O}(\sqrt{T})$ without a simulator, while maintaining
computational efficiency when the action set in each round is small. In the
special case of sleeping bandits with adversarial loss and stochastic arm
availability, our result answers affirmatively the open question by Saha et al.
[2020] on whether there exists a polynomial-time algorithm with
$poly(d)\sqrt{T}$ regret. Our approach naturally handles the case where the
loss is linear up to an additive misspecification error, and our regret shows
near-optimal dependence on the magnitude of the error.
Related papers
- Best-of-Both-Worlds Linear Contextual Bandits [45.378265414553226]
This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption.
We develop a strategy that is effective in both adversarial environments, with theoretical guarantees.
We refer to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its theoretical guarantees in both adversarial regimes.
arXiv Detail & Related papers (2023-12-27T09:32:18Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Complete Policy Regret Bounds for Tallying Bandits [51.039677652803675]
Policy regret is a well established notion of measuring the performance of an online learning algorithm against an adaptive adversary.
We study restrictions on the adversary that enable efficient minimization of the emphcomplete policy regret
We provide an algorithm that w.h.p a complete policy regret guarantee of $tildemathcalO(mKsqrtT)$, where the $tildemathcalO$ notation hides only logarithmic factors.
arXiv Detail & Related papers (2022-04-24T03:10:27Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Adversarial Combinatorial Bandits with General Non-linear Reward
Functions [29.775097827244853]
We study the adversarial bandit with a known non-linear reward function, extending existing work on adversarial linear bandit.
We show that with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $widetildeTheta_d(Nd T)$ if the reward function is a $d$-degree with $d K$, and $ta_K(sqrtK T)$ if the reward function is not a low-degree.
arXiv Detail & Related papers (2021-01-05T00:56:27Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Efficient and Robust Algorithms for Adversarial Linear Contextual
Bandits [13.32560004325655]
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem.
Under the assumption that the $d$-dimensional contexts are generated i.i.d.at random, we develop efficient algorithms based on the classic Exp3 algorithm.
arXiv Detail & Related papers (2020-02-01T22:49:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.