Multi-armed Bandit Requiring Monotone Arm Sequences
- URL: http://arxiv.org/abs/2106.03790v2
- Date: Tue, 8 Jun 2021 19:52:14 GMT
- Title: Multi-armed Bandit Requiring Monotone Arm Sequences
- Authors: Ningyuan Chen
- Abstract summary: We consider the continuum-armed bandit problem when the arm sequence is required to be monotone.
We show that when the unknown objective function is Lipschitz continuous, the regret is $tilde O(T3/4)$ under the proposed algorithm.
This deviates from the optimal rate $tilde O(T2/3)$ in the continuous-armed bandit literature and demonstrates the cost to the learning efficiency brought by the monotonicity requirement.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many online learning or multi-armed bandit problems, the taken actions or
pulled arms are ordinal and required to be monotone over time. Examples include
dynamic pricing, in which the firms use markup pricing policies to please early
adopters and deter strategic waiting, and clinical trials, in which the dose
allocation usually follows the dose escalation principle to prevent dose
limiting toxicities. We consider the continuum-armed bandit problem when the
arm sequence is required to be monotone. We show that when the unknown
objective function is Lipschitz continuous, the regret is $O(T)$. When in
addition the objective function is unimodal or quasiconcave, the regret is
$\tilde O(T^{3/4})$ under the proposed algorithm, which is also shown to be the
optimal rate. This deviates from the optimal rate $\tilde O(T^{2/3})$ in the
continuous-armed bandit literature and demonstrates the cost to the learning
efficiency brought by the monotonicity requirement.
Related papers
- Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback [11.262167746929332]
Restless multi-armed bandits (RMAB) play a central role in modeling sequential decision making problems.
In this paper, we consider the task of learning in episodic RMAB with unknown transition functions and adversarial rewards.
We develop a novel reinforcement learning algorithm with two key contributors.
arXiv Detail & Related papers (2024-05-02T02:20:19Z) - Weighted Tallying Bandits: Overcoming Intractability via Repeated
Exposure Optimality [46.445321251991096]
In recommender system or crowdsourcing applications, a human's preferences or abilities are often a function of the algorithm's recent actions.
We introduce the Weighted Tallying Bandit (WTB), which generalizes this setting by requiring that an action's loss is a function of a emph summation of the number of times that arm was played in the last $m$ timesteps.
We show that for problems with $K$ actions and horizon $T$, a simple modification of the successive elimination algorithm has $O left( sqrtKT +m+
arXiv Detail & Related papers (2023-05-04T15:59:43Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Problem Dependent View on Structured Thresholding Bandit Problems [73.70176003598449]
We investigate the problem dependent regime in the Thresholding Bandit problem (TBP)
The objective of the learner is to output, at the end of a sequential game, the set of arms whose means are above a given threshold.
We provide upper and lower bounds for the probability of error in both the concave and monotone settings.
arXiv Detail & Related papers (2021-06-18T15:01:01Z) - Asymptotically Optimal Bandits under Weighted Information [10.939683083130616]
We study the problem of regret in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round.
We propose a Thompson-Sampling-based strategy, that designs the power profile as its posterior belief of each arm being the best arm.
arXiv Detail & Related papers (2021-05-28T21:28:44Z) - Recurrent Submodular Welfare and Matroid Blocking Bandits [22.65352007353614]
A recent line of research focuses on the study of the multi-armed bandits problem (MAB)
We develop new algorithmic ideas that allow us to obtain a $ (1 - frac1e)$-approximation for any matroid.
A key ingredient is the technique of correlated (interleaved) scheduling.
arXiv Detail & Related papers (2021-01-30T21:51:47Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.