Related papers: Stochastic Rising Bandits

Stochastic Rising Bandits

URL: http://arxiv.org/abs/2212.03798v1
Date: Wed, 7 Dec 2022 17:30:45 GMT
Title: Stochastic Rising Bandits
Authors: Alberto Maria Metelli, Francesco Trov\`o, Matteo Pirola, Marcello Restelli
Abstract summary: We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
Score: 40.32303434592863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset. Finally, using synthetic and real-world data, we illustrate the effectiveness of the proposed approaches compared with state-of-the-art algorithms for the non-stationary bandits.

Related papers

Influential Bandits: Pulling an Arm May Change the Environment [44.71145269686588]
Real-world applications often involve non-stationary environments and interdependencies between arms. We propose the influential bandit problem, which models inter-arm interactions through an unknown, symmetric, positive semi-definite interaction matrix. We introduce a new algorithm based on a lower confidence bound (LCB) estimator tailored to the structure of the loss dynamics.
arXiv Detail & Related papers (2025-04-11T02:05:51Z)
Rising Rested Bandits: Lower Bounds and Efficient Algorithms [15.390680055166769]
This paper is in the field of sequential Multi-Armed Bandits (MABs) We study a particular case of the rested bandits in which the arms' expected reward is monotonically non-decreasing and concave. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
arXiv Detail & Related papers (2024-11-06T22:00:46Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM) We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z)
Discrete Choice Multi-Armed Bandits [0.0]
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms. We furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case. We introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models.
arXiv Detail & Related papers (2023-10-01T03:41:04Z)
Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models [25.336599480692122]
We consider the regret minimization task in a dueling bandits problem with context information. We propose a computationally efficient algorithm, $texttCoLSTIM$, which makes its choice based on imitating the feedback process. Our experiments demonstrate its superiority over state-of-art algorithms for special cases of CoLST models.
arXiv Detail & Related papers (2022-02-09T17:44:19Z)
Universal and data-adaptive algorithms for model selection in linear contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem. We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$. Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z)
Recurrent Submodular Welfare and Matroid Blocking Bandits [22.65352007353614]
A recent line of research focuses on the study of the multi-armed bandits problem (MAB) We develop new algorithmic ideas that allow us to obtain a $ (1 - frac1e)$-approximation for any matroid. A key ingredient is the technique of correlated (interleaved) scheduling.
arXiv Detail & Related papers (2021-01-30T21:51:47Z)
Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits [61.490254407420906]
We study the online restless bandit problem, where the state of each arm evolves according to a Markov chain. We propose Restless-UCB, a learning policy that follows the explore-then-commit framework.
arXiv Detail & Related papers (2020-11-05T05:16:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.