Best Arm Identification in Batched Multi-armed Bandit Problems
- URL: http://arxiv.org/abs/2312.13875v1
- Date: Thu, 21 Dec 2023 14:16:38 GMT
- Title: Best Arm Identification in Batched Multi-armed Bandit Problems
- Authors: Shengyu Cao, Simai He, Ruoqing Jiang, Jin Xu, Hongsong Yuan
- Abstract summary: Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches.
We introduce a general linear programming framework that can incorporate objectives of different theoretical settings in best arm identification.
We demonstrate by numerical studies that the algorithm also has good performance compared to certain UCB-type or Thompson sampling methods.
- Score: 3.908867017036043
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently multi-armed bandit problem arises in many real-life scenarios where
arms must be sampled in batches, due to limited time the agent can wait for the
feedback. Such applications include biological experimentation and online
marketing. The problem is further complicated when the number of arms is large
and the number of batches is small. We consider pure exploration in a batched
multi-armed bandit problem. We introduce a general linear programming framework
that can incorporate objectives of different theoretical settings in best arm
identification. The linear program leads to a two-stage algorithm that can
achieve good theoretical properties. We demonstrate by numerical studies that
the algorithm also has good performance compared to certain UCB-type or
Thompson sampling methods.
Related papers
- The Batch Complexity of Bandit Pure Exploration [10.036727981085223]
In a pure exploration problem in multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions.
We are interested in batched methods, which change their sampling behaviour only a few times, between batches of observations.
We give an instance-dependent lower bound on the number of batches used by any sample efficient algorithm for any pure exploration task.
arXiv Detail & Related papers (2025-02-03T15:03:45Z) - Breaking the $\log(1/Δ_2)$ Barrier: Better Batched Best Arm Identification with Adaptive Grids [28.547030766096956]
We introduce an algorithm that achieves near-optimal sample complexity and features an instance-sensitive batch complexity.
We extend this framework to the problem of batched best arm identification in linear bandits and achieve similar improvements.
arXiv Detail & Related papers (2025-01-29T01:40:36Z) - Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.
The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.
We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z) - Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [55.2480439325792]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
We introduce an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor.
We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Differential Good Arm Identification [4.666048091337632]
This paper targets a variant of the multi-armed bandit problem called good arm identification (GAI)
GAI is a pure-exploration bandit problem with the goal to output as many good arms using as few samples as possible.
We propose DGAI - a differentiable good arm identification algorithm.
arXiv Detail & Related papers (2023-03-13T14:28:21Z) - Dueling Bandits: From Two-dueling to Multi-dueling [40.4378338001229]
We study a general multi-dueling bandit problem, where an agent compares multiple options simultaneously and aims to minimize the regret due to selecting suboptimal arms.
This setting generalizes the traditional two-dueling bandit problem and finds many real-world applications involving subjective feedback on multiple options.
arXiv Detail & Related papers (2022-11-16T22:00:54Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.