Stochastic Rising Bandits
- URL: http://arxiv.org/abs/2212.03798v1
- Date: Wed, 7 Dec 2022 17:30:45 GMT
- Title: Stochastic Rising Bandits
- Authors: Alberto Maria Metelli, Francesco Trov\`o, Matteo Pirola, Marcello
Restelli
- Abstract summary: We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing.
This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds.
We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
- Score: 40.32303434592863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e.,
those sequential selection techniques able to learn online using only the
feedback given by the chosen option (a.k.a. arm). We study a particular case of
the rested and restless bandits in which the arms' expected payoff is
monotonically non-decreasing. This characteristic allows designing specifically
crafted algorithms that exploit the regularity of the payoffs to provide tight
regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one
for the restless case (R-less-UCB), providing a regret bound depending on the
properties of the instance and, under certain circumstances, of
$\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our
algorithms with state-of-the-art methods for non-stationary MABs over several
synthetically generated tasks and an online model selection problem for a
real-world dataset. Finally, using synthetic and real-world data, we illustrate
the effectiveness of the proposed approaches compared with state-of-the-art
algorithms for the non-stationary bandits.
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Discrete Choice Multi-Armed Bandits [0.0]
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms.
We furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case.
We introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models.
arXiv Detail & Related papers (2023-10-01T03:41:04Z) - Stochastic Contextual Dueling Bandits under Linear Stochastic
Transitivity Models [25.336599480692122]
We consider the regret minimization task in a dueling bandits problem with context information.
We propose a computationally efficient algorithm, $texttCoLSTIM$, which makes its choice based on imitating the feedback process.
Our experiments demonstrate its superiority over state-of-art algorithms for special cases of CoLST models.
arXiv Detail & Related papers (2022-02-09T17:44:19Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - Recurrent Submodular Welfare and Matroid Blocking Bandits [22.65352007353614]
A recent line of research focuses on the study of the multi-armed bandits problem (MAB)
We develop new algorithmic ideas that allow us to obtain a $ (1 - frac1e)$-approximation for any matroid.
A key ingredient is the technique of correlated (interleaved) scheduling.
arXiv Detail & Related papers (2021-01-30T21:51:47Z) - Restless-UCB, an Efficient and Low-complexity Algorithm for Online
Restless Bandits [61.490254407420906]
We study the online restless bandit problem, where the state of each arm evolves according to a Markov chain.
We propose Restless-UCB, a learning policy that follows the explore-then-commit framework.
arXiv Detail & Related papers (2020-11-05T05:16:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.