Thompson Sampling-like Algorithms for Stochastic Rising Bandits
- URL: http://arxiv.org/abs/2505.12092v2
- Date: Tue, 20 May 2025 15:08:08 GMT
- Title: Thompson Sampling-like Algorithms for Stochastic Rising Bandits
- Authors: Marco Fiandri, Alberto Maria Metelli, Francesco Trovò,
- Abstract summary: Rising rested bandit (SRRB) is a setting where the arms' expected rewards increase as they are pulled.<n>This work provides novel regret analyses for such algorithms in SRRBs, highlighting the challenges and providing new technical tools of independent interest.
- Score: 20.143361197609934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic rising rested bandit (SRRB) is a setting where the arms' expected rewards increase as they are pulled. It models scenarios in which the performances of the different options grow as an effect of an underlying learning process (e.g., online model selection). Even if the bandit literature provides specifically crafted algorithms based on upper-confidence bounds for such a setting, no study about Thompson sampling TS-like algorithms has been performed so far. The strong regularity of the expected rewards in the SRRB setting suggests that specific instances may be tackled effectively using adapted and sliding-window TS approaches. This work provides novel regret analyses for such algorithms in SRRBs, highlighting the challenges and providing new technical tools of independent interest. Our results allow us to identify under which assumptions TS-like algorithms succeed in achieving sublinear regret and which properties of the environment govern the complexity of the regret minimization problem when approached with TS. Furthermore, we provide a regret lower bound based on a complexity index we introduce. Finally, we conduct numerical simulations comparing TS-like algorithms with state-of-the-art approaches for SRRBs in synthetic and real-world settings.
Related papers
- State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards [18.0878149546412]
This paper considers the setting of recovering bandits, where the reward depends on the number of rounds elapsed since the last time an arm was pulled.
We propose a new reinforcement learning (RL) tailored to this setting, named the State-Separate SARSA (SS-SARSA) algorithm, which treats rounds as states.
arXiv Detail & Related papers (2024-03-18T07:14:21Z) - TS-RSR: A provably efficient approach for batch bayesian optimization [4.622871908358325]
This paper presents a new approach for batch Bayesian Optimization (BO) called Thompson Sampling-Regret to Sigma Ratio directed sampling.
Our sampling objective is able to coordinate the actions chosen in each batch in a way that minimizes redundancy between points.
We demonstrate that our method attains state-of-the-art performance on a range of challenging synthetic and realistic test functions.
arXiv Detail & Related papers (2024-03-07T18:58:26Z) - Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z) - Best Arm Identification for Stochastic Rising Bandits [84.55453174601826]
Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected.
This paper focuses on the fixed-budget Best Arm Identification (BAI) problem for SRBs.
We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE and R-SR.
arXiv Detail & Related papers (2023-02-15T08:01:37Z) - Stochastic Rising Bandits [40.32303434592863]
We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing.
This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds.
We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
arXiv Detail & Related papers (2022-12-07T17:30:45Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Batched Thompson Sampling for Multi-Armed Bandits [9.467098519620263]
We study Thompson Sampling algorithms for multi-armed bandits in the batched setting.
We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets.
arXiv Detail & Related papers (2021-08-15T20:47:46Z) - Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for
Non-Stationary Bandits [2.879036956042183]
We consider the non-stationary multi-armed bandit (MAB) framework and propose a Kolmogorov-Smirnov (KS) test based Thompson Sampling (TS) algorithm named TS-KS.
In particular, for the two-armed bandit case, we derive bounds on the number of samples of the reward distribution to detect the change once it occurs.
Our results show that the proposed TS-KS algorithm outperforms not only the static TS algorithm but also it performs better than other bandit algorithms designed for non-stationary environments.
arXiv Detail & Related papers (2021-05-30T17:28:41Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.