Smooth Non-Stationary Bandits
- URL: http://arxiv.org/abs/2301.12366v2
- Date: Wed, 7 Jun 2023 17:32:00 GMT
- Title: Smooth Non-Stationary Bandits
- Authors: Su Jia, Qian Xie, Nathan Kallus, Peter I. Frazier
- Abstract summary: We study a non-stationary two-armed bandits problem where an arm's mean reward is a $beta$-H"older function over (normalized) time.
We show the first separation between the smooth and non-smooth regimes by presenting a policy with $tilde O(T3/5)$ regret for $beta=2$.
- Score: 54.40609666958948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many applications of online decision making, the environment is
non-stationary and it is therefore crucial to use bandit algorithms that handle
changes. Most existing approaches are designed to protect against non-smooth
changes, constrained only by total variation or Lipschitzness over time, where
they guarantee $\tilde \Theta(T^{2/3})$ regret. However, in practice
environments are often changing {\bf smoothly}, so such algorithms may incur
higher-than-necessary regret in these settings and do not leverage information
on the rate of change. We study a non-stationary two-armed bandits problem
where we assume that an arm's mean reward is a $\beta$-H\"older function over
(normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously
differentiable. We show the first separation between the smooth and non-smooth
regimes by presenting a policy with $\tilde O(T^{3/5})$ regret for $\beta=2$.
We complement this result by an $\Omg(T^{(\beta+1)/(2\beta+1)})$ lower bound
for any integer $\beta\ge 1$, which matches our upper bound for $\beta=2$.
Related papers
- Generalized Linear Bandits with Limited Adaptivity [12.112051468737596]
We study the generalized linear contextual bandit problem within the constraints of limited adaptivity.
We present two algorithms, $textttB-GLinCB$ and $textttRS-GLinCB$, that address, respectively, two prevalent limited adaptivity settings.
arXiv Detail & Related papers (2024-04-10T08:47:57Z) - First- and Second-Order Bounds for Adversarial Linear Contextual Bandits [22.367921675238318]
We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of $K$ arms to change over time without restriction.
Since $V_T$ or $L_T*$ may be significantly smaller than $T$, these improve over the worst-case regret whenever the environment is relatively benign.
arXiv Detail & Related papers (2023-05-01T14:00:15Z) - Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$.
In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret.
We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Minimal Expected Regret in Linear Quadratic Control [79.81807680370677]
We devise an online learning algorithm and provide guarantees on its expected regret.
This regret at time $T$ is upper bounded (i) by $widetildeO((d_u+d_x)sqrtd_xT)$ when $A$ and $B$ are unknown.
arXiv Detail & Related papers (2021-09-29T14:07:21Z) - An Algorithm for Stochastic and Adversarial Bandits with Switching Costs [10.549307055348596]
We propose an algorithm for adversarial multiarmed bandits with switching costs, where the algorithm pays a price $lambda$ every time it switches the arm being played.
Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon.
arXiv Detail & Related papers (2021-02-19T11:03:51Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Smooth Bandit Optimization: Generalization to H\"older Space [37.15553727896912]
We consider bandit optimization of a smooth reward function, where the goal is cumulative regret.
Our main result is in generalization of the reward function to H"older space with exponent $alpha>1$.
We show that it achieves regret rate that matches the existing lower bound for adaptation within the $alphaleq 1$ subset.
arXiv Detail & Related papers (2020-12-11T01:43:25Z) - Adversarial Dueling Bandits [85.14061196945599]
We introduce the problem of regret in Adversarial Dueling Bandits.
The learner has to repeatedly choose a pair of items and observe only a relative binary win-loss' feedback for this pair.
Our main result is an algorithm whose $T$-round regret compared to the emphBorda-winner from a set of $K$ items.
arXiv Detail & Related papers (2020-10-27T19:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.