Linear Bandits with Memory: from Rotting to Rising
- URL: http://arxiv.org/abs/2302.08345v2
- Date: Thu, 25 May 2023 07:53:34 GMT
- Title: Linear Bandits with Memory: from Rotting to Rising
- Authors: Giulia Clerici, Pierre Laforgue, Nicol\`o Cesa-Bianchi
- Abstract summary: Nonstationary phenomena, such as satiation effects in recommendations, have mostly been modeled using bandits with finitely many arms.
We introduce a novel nonstationary linear bandit model, where current rewards are influenced by the learner's past actions in a fixed-size window.
- Score: 5.5969337839476765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nonstationary phenomena, such as satiation effects in recommendations, have
mostly been modeled using bandits with finitely many arms. However, the richer
action space provided by linear bandits is often preferred in practice. In this
work, we introduce a novel nonstationary linear bandit model, where current
rewards are influenced by the learner's past actions in a fixed-size window.
Our model, which recovers stationary linear bandits as a special case,
leverages two parameters: the window size $m \ge 0$, and an exponent $\gamma$
that captures the rotting ($\gamma < 0)$ or rising ($\gamma > 0$) nature of the
phenomenon. When both $m$ and $\gamma$ are known, we propose and analyze a
variant of OFUL which minimizes regret against cycling policies. By choosing
the cycle length so as to trade-off approximation and estimation errors, we
then prove a bound of order
$\sqrt{d}\,(m+1)^{\frac{1}{2}+\max\{\gamma,0\}}\,T^{3/4}$ (ignoring log
factors) on the regret against the optimal sequence of actions, where $T$ is
the horizon and $d$ is the dimension of the linear action space. Through a
bandit model selection approach, our results are extended to the case where $m$
and $\gamma$ are unknown. Finally, we complement our theoretical results with
experiments against natural baselines.
Related papers
- Linear bandits with polylogarithmic minimax regret [8.97780713904412]
We study a noise model for linear bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector.
We introduce an algorithm for this problem that exhibits a minimax regret scaling as $log3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms.
arXiv Detail & Related papers (2024-02-19T10:56:47Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model.
It is assumed that the graph's structure is known and has $N$ nodes.
Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z) - Exploration in Linear Bandits with Rich Action Sets and its Implications
for Inference [23.364534479492715]
We show that the minimum eigenvalue of the expected matrix grows as $Omega(sqrtn) whenever the expected cumulative regret of the algorithm is $sqrtn)$.
We apply our result to two practical scenarios -- emphmodel selection and emphclustering in linear bandits.
arXiv Detail & Related papers (2022-07-23T20:25:07Z) - Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$.
In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret.
We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z) - Adversarial Combinatorial Bandits with General Non-linear Reward
Functions [29.775097827244853]
We study the adversarial bandit with a known non-linear reward function, extending existing work on adversarial linear bandit.
We show that with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $widetildeTheta_d(Nd T)$ if the reward function is a $d$-degree with $d K$, and $ta_K(sqrtK T)$ if the reward function is not a low-degree.
arXiv Detail & Related papers (2021-01-05T00:56:27Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed
Payoffs [35.988644745703645]
We analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+epsilon$ moments.
We propose two novel algorithms which enjoy a sublinear regret bound of $widetildeO(dfrac12Tfrac11+epsilon)$.
arXiv Detail & Related papers (2020-04-28T13:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.