An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints
- URL: http://arxiv.org/abs/2404.14202v3
- Date: Sun, 13 Oct 2024 07:32:54 GMT
- Title: An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints
- Authors: Jung-hun Kim, Milan Vojnovic, Se-Young Yun,
- Abstract summary: In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged.
We introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards.
- Score: 29.596684377841182
- License:
- Abstract: In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative number of rotting instances is bounded by $S_T$, referred to as the abrupt-rotting case. To address the challenge posed by rotting rewards, we introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards. Our proposed algorithm achieves tight regret bounds for both slow and abrupt rotting scenarios. Lastly, we demonstrate the performance of our algorithm using numerical experiments.
Related papers
- Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Contextual bandits with concave rewards, and an application to fair
ranking [108.48223948875685]
We present the first algorithm with provably vanishing regret for Contextual Bandits with Concave Rewards (CBCR)
We derive a novel reduction from the CBCR regret to the regret of a scalar-reward problem.
Motivated by fairness in recommendation, we describe a special case of CBCR with rankings and fairness-aware objectives.
arXiv Detail & Related papers (2022-10-18T16:11:55Z) - The price of unfairness in linear bandits with biased feedback [62.25313751895011]
We study the problem of sequential decision making with biased linear bandit feedback.
We show that the worst case regret is higher than the dT 1/2 log(T) regret rate obtained under unbiased feedback.
Interestingly, the gap-dependent rates reveal the existence of non-trivial instances where the problem is no more difficult than its unbiased counterpart.
arXiv Detail & Related papers (2022-03-18T08:03:20Z) - Risk-Aware Algorithms for Combinatorial Semi-Bandits [7.716156977428555]
We study the multi-armed bandit problem under semi-bandit feedback.
We consider the problem of maximizing the Conditional Value-at-Risk (CVaR), a risk measure that takes into account only the worst-case rewards.
We propose new algorithms that maximize the CVaR of the rewards obtained from the super arms of the bandit.
arXiv Detail & Related papers (2021-12-02T11:29:43Z) - On Slowly-varying Non-stationary Bandits [25.305949034527202]
We consider dynamic regret in non-stationary bandits with a slowly varying property.
We establish the first instance-dependent regret upper bound for slowly varying non-stationary bandits.
We show that our algorithm is essentially minimax optimal.
arXiv Detail & Related papers (2021-10-25T12:56:19Z) - Multi-armed Bandit Algorithm against Strategic Replication [5.235979896921492]
We consider a multi-armed bandit problem in which a set of arms is registered by each agent, and the agent receives reward when its arm is selected.
An agent might strategically submit more arms with replications, which can bring more reward by abusing the bandit algorithm's exploration-exploitation balance.
We propose a bandit algorithm which demotivates replications and also achieves a small cumulative regret.
arXiv Detail & Related papers (2021-10-23T07:38:44Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Adaptive Discretization for Adversarial Lipschitz Bandits [85.39106976861702]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.
A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.
We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.