Best Arm Identification in Batched Multi-armed Bandit Problems
- URL: http://arxiv.org/abs/2312.13875v1
- Date: Thu, 21 Dec 2023 14:16:38 GMT
- Title: Best Arm Identification in Batched Multi-armed Bandit Problems
- Authors: Shengyu Cao, Simai He, Ruoqing Jiang, Jin Xu, Hongsong Yuan
- Abstract summary: Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches.
We introduce a general linear programming framework that can incorporate objectives of different theoretical settings in best arm identification.
We demonstrate by numerical studies that the algorithm also has good performance compared to certain UCB-type or Thompson sampling methods.
- Score: 3.908867017036043
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently multi-armed bandit problem arises in many real-life scenarios where
arms must be sampled in batches, due to limited time the agent can wait for the
feedback. Such applications include biological experimentation and online
marketing. The problem is further complicated when the number of arms is large
and the number of batches is small. We consider pure exploration in a batched
multi-armed bandit problem. We introduce a general linear programming framework
that can incorporate objectives of different theoretical settings in best arm
identification. The linear program leads to a two-stage algorithm that can
achieve good theoretical properties. We demonstrate by numerical studies that
the algorithm also has good performance compared to certain UCB-type or
Thompson sampling methods.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Differential Good Arm Identification [4.666048091337632]
This paper targets a variant of the multi-armed bandit problem called good arm identification (GAI)
GAI is a pure-exploration bandit problem with the goal to output as many good arms using as few samples as possible.
We propose DGAI - a differentiable good arm identification algorithm.
arXiv Detail & Related papers (2023-03-13T14:28:21Z) - Dueling Bandits: From Two-dueling to Multi-dueling [40.4378338001229]
We study a general multi-dueling bandit problem, where an agent compares multiple options simultaneously and aims to minimize the regret due to selecting suboptimal arms.
This setting generalizes the traditional two-dueling bandit problem and finds many real-world applications involving subjective feedback on multiple options.
arXiv Detail & Related papers (2022-11-16T22:00:54Z) - Batched Thompson Sampling for Multi-Armed Bandits [9.467098519620263]
We study Thompson Sampling algorithms for multi-armed bandits in the batched setting.
We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets.
arXiv Detail & Related papers (2021-08-15T20:47:46Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.