Non-stationary Linear Bandits Revisited
- URL: http://arxiv.org/abs/2103.05324v1
- Date: Tue, 9 Mar 2021 10:07:17 GMT
- Title: Non-stationary Linear Bandits Revisited
- Authors: Peng Zhao and Lijun Zhang
- Abstract summary: Non-stationary linear bandits are a variant of linear bandits with a time-varying underlying regression parameter.
We prove an $widetildeO(T3/4(1+P_T)1/4)$ dynamic regret for these algorithms, slightly worse than the rate as was anticipated.
- Score: 26.082923174615495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this note, we revisit non-stationary linear bandits, a variant of
stochastic linear bandits with a time-varying underlying regression parameter.
Existing studies develop various algorithms and show that they enjoy an
$\widetilde{O}(T^{2/3}(1+P_T)^{1/3})$ dynamic regret, where $T$ is the time
horizon and $P_T$ is the path-length that measures the fluctuation of the
evolving unknown parameter. However, we discover that a serious technical flaw
makes the argument ungrounded. We revisit the analysis and present a fix.
Without modifying original algorithms, we can prove an
$\widetilde{O}(T^{3/4}(1+P_T)^{1/4})$ dynamic regret for these algorithms,
slightly worse than the rate as was anticipated. We also show some
impossibility results for the key quantity concerned in the regret analysis.
Note that the above dynamic regret guarantee requires an oracle knowledge of
the path-length $P_T$. Combining the bandit-over-bandit mechanism, we can also
achieve the same guarantee in a parameter-free way.
Related papers
- Tangential Randomization in Linear Bandits (TRAiL): Guaranteed Inference and Regret Bounds [1.03590082373586]
We propose and analyze TRAiL, a regret-optimal forced exploration algorithm for linear bandits.
TraiL ensures a $Omega(sqrtT)$ growth in the inference quality, measured via the minimum eigenvalue of the design (regressor) matrix.
We characterize an $Omega(sqrtT)$ minimax lower bound for any algorithm on the expected regret.
arXiv Detail & Related papers (2024-11-19T01:08:13Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Linear bandits with polylogarithmic minimax regret [8.97780713904412]
We study a noise model for linear bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector.
We introduce an algorithm for this problem that exhibits a minimax regret scaling as $log3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms.
arXiv Detail & Related papers (2024-02-19T10:56:47Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$.
In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret.
We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Simultaneously Learning Stochastic and Adversarial Episodic MDPs with
Known Transition [38.28925339231888]
We develop the first algorithm with a best-of-both-worlds'' guarantee.
It achieves $mathcalO(log T)$ regret when the losses are adversarial.
More generally, it achieves $tildemathcalO(sqrtC)$ regret in an intermediate setting.
arXiv Detail & Related papers (2020-06-10T01:59:34Z) - A new regret analysis for Adam-type algorithms [78.825194932103]
In theory, regret guarantees for online convex optimization require a rapidly decaying $beta_1to0$ schedule.
We propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $beta_1$.
arXiv Detail & Related papers (2020-03-21T19:19:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.