Efficient and Robust Algorithms for Adversarial Linear Contextual
Bandits
- URL: http://arxiv.org/abs/2002.00287v3
- Date: Tue, 24 May 2022 14:05:52 GMT
- Title: Efficient and Robust Algorithms for Adversarial Linear Contextual
Bandits
- Authors: Gergely Neu, Julia Olkhovskaya
- Abstract summary: We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem.
Under the assumption that the $d$-dimensional contexts are generated i.i.d.at random, we develop efficient algorithms based on the classic Exp3 algorithm.
- Score: 13.32560004325655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider an adversarial variant of the classic $K$-armed linear contextual
bandit problem where the sequence of loss functions associated with each arm
are allowed to change without restriction over time. Under the assumption that
the $d$-dimensional contexts are generated i.i.d.~at random from a known
distributions, we develop computationally efficient algorithms based on the
classic Exp3 algorithm. Our first algorithm, RealLinExp3, is shown to achieve a
regret guarantee of $\widetilde{O}(\sqrt{KdT})$ over $T$ rounds, which matches
the best available bound for this problem. Our second algorithm, RobustLinExp3,
is shown to be robust to misspecification, in that it achieves a regret bound
of $\widetilde{O}((Kd)^{1/3}T^{2/3}) + \varepsilon \sqrt{d} T$ if the true
reward function is linear up to an additive nonlinear error uniformly bounded
in absolute value by $\varepsilon$. To our knowledge, our performance
guarantees constitute the very first results on this problem setting.
Related papers
- Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits.
Our algorithms deliver near-optimal regret bounds in both the adversarial and adversarial regimes.
arXiv Detail & Related papers (2023-12-24T08:27:30Z) - Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed
Rewards [40.99322897009357]
We propose two novel algorithms based on truncation and mean of medians.
Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches.
Our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $epsilon=1$.
arXiv Detail & Related papers (2023-10-28T13:01:10Z) - Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback [30.337826346496385]
We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback.
We introduce two algorithms that achieve improved regret performance compared to existing approaches.
arXiv Detail & Related papers (2023-10-17T19:43:37Z) - Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear
Bandits [16.103685478563072]
This paper considers linear bandits with general constraints.
The objective is to maximize the expected cumulative reward over horizon $T$ subject to a set of constraints in each round.
We propose a pessimistic-optimistic algorithm for this problem, which is efficient in two aspects.
arXiv Detail & Related papers (2021-02-10T07:30:37Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Revisiting Modified Greedy Algorithm for Monotone Submodular
Maximization with a Knapsack Constraint [75.85952446237599]
We show that a modified greedy algorithm can achieve an approximation factor of $0.305$.
We derive a data-dependent upper bound on the optimum.
It can also be used to significantly improve the efficiency of such algorithms as branch and bound.
arXiv Detail & Related papers (2020-08-12T15:40:21Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Low-Rank Generalized Linear Bandit Problems [19.052901817304743]
In a low-rank linear bandit problem, the reward of an action is the inner product between the action and an unknown low-rank matrix $Theta*$.
We propose an algorithm based on a novel combination of online-to-confidence-set conversioncitepabbasi2012online and the exponentially weighted average forecaster constructed by a covering of low-rank matrices.
arXiv Detail & Related papers (2020-06-04T15:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.