Related papers: Multi-armed Bandit Requiring Monotone Arm Sequences

Multi-armed Bandit Requiring Monotone Arm Sequences

URL: http://arxiv.org/abs/2106.03790v2
Date: Tue, 8 Jun 2021 19:52:14 GMT
Title: Multi-armed Bandit Requiring Monotone Arm Sequences
Authors: Ningyuan Chen
Abstract summary: We consider the continuum-armed bandit problem when the arm sequence is required to be monotone. We show that when the unknown objective function is Lipschitz continuous, the regret is $tilde O(T3/4)$ under the proposed algorithm. This deviates from the optimal rate $tilde O(T2/3)$ in the continuous-armed bandit literature and demonstrates the cost to the learning efficiency brought by the monotonicity requirement.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many online learning or multi-armed bandit problems, the taken actions or pulled arms are ordinal and required to be monotone over time. Examples include dynamic pricing, in which the firms use markup pricing policies to please early adopters and deter strategic waiting, and clinical trials, in which the dose allocation usually follows the dose escalation principle to prevent dose limiting toxicities. We consider the continuum-armed bandit problem when the arm sequence is required to be monotone. We show that when the unknown objective function is Lipschitz continuous, the regret is $O(T)$. When in addition the objective function is unimodal or quasiconcave, the regret is $\tilde O(T^{3/4})$ under the proposed algorithm, which is also shown to be the optimal rate. This deviates from the optimal rate $\tilde O(T^{2/3})$ in the continuous-armed bandit literature and demonstrates the cost to the learning efficiency brought by the monotonicity requirement.

Related papers

Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits [31.212504858546232]
We tackle the heavy-tailed piecewise-stationary bandit problem.<n>We provide a novel Catoni-style change-point detection strategy tailored for heavy-tailed distributions.<n>We introduce Robust-CPD-UCB, which combines this change-point detection strategy with optimistic algorithms for bandits.
arXiv Detail & Related papers (2025-05-26T14:40:47Z)
Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits. In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z)
Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback [11.262167746929332]
Restless multi-armed bandits (RMAB) play a central role in modeling sequential decision making problems. In this paper, we consider the task of learning in episodic RMAB with unknown transition functions and adversarial rewards. We develop a novel reinforcement learning algorithm with two key contributors.
arXiv Detail & Related papers (2024-05-02T02:20:19Z)
Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality [46.445321251991096]
In recommender system or crowdsourcing applications, a human's preferences or abilities are often a function of the algorithm's recent actions. We introduce the Weighted Tallying Bandit (WTB), which generalizes this setting by requiring that an action's loss is a function of a emph summation of the number of times that arm was played in the last $m$ timesteps. We show that for problems with $K$ actions and horizon $T$, a simple modification of the successive elimination algorithm has $O left( sqrtKT +m+
arXiv Detail & Related papers (2023-05-04T15:59:43Z)
Problem Dependent View on Structured Thresholding Bandit Problems [73.70176003598449]
We investigate the problem dependent regime in the Thresholding Bandit problem (TBP) The objective of the learner is to output, at the end of a sequential game, the set of arms whose means are above a given threshold. We provide upper and lower bounds for the probability of error in both the concave and monotone settings.
arXiv Detail & Related papers (2021-06-18T15:01:01Z)
Asymptotically Optimal Bandits under Weighted Information [10.939683083130616]
We study the problem of regret in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round. We propose a Thompson-Sampling-based strategy, that designs the power profile as its posterior belief of each arm being the best arm.
arXiv Detail & Related papers (2021-05-28T21:28:44Z)
Recurrent Submodular Welfare and Matroid Blocking Bandits [22.65352007353614]
A recent line of research focuses on the study of the multi-armed bandits problem (MAB) We develop new algorithmic ideas that allow us to obtain a $ (1 - frac1e)$-approximation for any matroid. A key ingredient is the technique of correlated (interleaved) scheduling.
arXiv Detail & Related papers (2021-01-30T21:51:47Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively. In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)
Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning [56.23358327635815]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.<n>A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.<n>We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.