On Penalization in Stochastic Multi-armed Bandits
- URL: http://arxiv.org/abs/2211.08311v1
- Date: Tue, 15 Nov 2022 17:13:09 GMT
- Title: On Penalization in Stochastic Multi-armed Bandits
- Authors: Guanhua Fang, Ping Li, Gennady Samorodnitsky
- Abstract summary: We study an important variant of the multi-armed bandit (MAB) problem, which takes penalization into consideration.
We propose a hard-threshold UCB-like algorithm, which enjoys many merits including fairness, nearly optimal regret, better tradeoff between reward and fairness.
- Score: 22.04356596828437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study an important variant of the stochastic multi-armed bandit (MAB)
problem, which takes penalization into consideration. Instead of directly
maximizing cumulative expected reward, we need to balance between the total
reward and fairness level. In this paper, we present some new insights in MAB
and formulate the problem in the penalization framework, where rigorous
penalized regret can be well defined and more sophisticated regret analysis is
possible. Under such a framework, we propose a hard-threshold UCB-like
algorithm, which enjoys many merits including asymptotic fairness, nearly
optimal regret, better tradeoff between reward and fairness. Both gap-dependent
and gap-independent regret bounds have been established. Multiple insightful
comments are given to illustrate the soundness of our theoretical analysis.
Numerous experimental results corroborate the theory and show the superiority
of our method over other existing methods.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints [29.596684377841182]
In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged.
We introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards.
arXiv Detail & Related papers (2024-04-22T14:11:54Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains.
We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue.
We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z) - Achieving the Pareto Frontier of Regret Minimization and Best Arm
Identification in Multi-Armed Bandits [91.8283876874947]
We design and analyze the BoBW-lil'UCB$(gamma)$ algorithm.
We show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives.
We also show that BoBW-lil'UCB$(gamma)$ outperforms a competitor in terms of the time complexity and the regret.
arXiv Detail & Related papers (2021-10-16T17:52:32Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z) - Nonstationary Stochastic Multiarmed Bandits: UCB Policies and Minimax
Regret [5.1398743023989555]
We study the nonstationary Multi-Armed Bandit (MAB) problem in which the distribution of rewards associated with each arm are assumed to be time-varying.
We characterize the performance of the proposed policies in terms of the worst-case regret, which is the supremum of the regret over the set of reward distribution sequences satisfying the variation budget.
arXiv Detail & Related papers (2021-01-22T07:34:09Z) - A One-Size-Fits-All Solution to Conservative Bandit Problems [32.907883501286]
We study a family of conservative bandit problems (CBPs) with sample-path reward constraints.
We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems.
arXiv Detail & Related papers (2020-12-14T08:50:23Z) - Reward Biased Maximum Likelihood Estimation for Reinforcement Learning [13.820705458648233]
Reward-Biased Maximum Likelihood Estimate (RBMLE) for adaptive control of Markov chains was proposed.
We show that it has a regret of $mathcalO( log T)$ over a time horizon of $T$ steps, similar to state-of-the-art algorithms.
arXiv Detail & Related papers (2020-11-16T06:09:56Z) - Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed
Rewards [24.983866845065926]
We consider multi-armed bandits with heavy-tailed rewards, whose $p$-th moment is bounded by a constant $nu_p$ for $1pleq2$.
We propose a novel robust estimator which does not require $nu_p$ as prior information.
We show that an error probability of the proposed estimator decays exponentially fast.
arXiv Detail & Related papers (2020-10-24T10:44:02Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.