Semiparametric Best Arm Identification with Contextual Information
- URL: http://arxiv.org/abs/2209.07330v2
- Date: Mon, 19 Sep 2022 17:24:19 GMT
- Title: Semiparametric Best Arm Identification with Contextual Information
- Authors: Masahiro Kato and Masaaki Imaizumi and Takuya Ishihara and Toru
Kitagawa
- Abstract summary: We study best-arm identification with a fixed budget and contextual information in bandit problems.
We develop the "Contextual RS-AIPW strategy," which consists of the random sampling rule tracking a target allocation ratio and the recommendation rule.
Our proposed Contextual RS-AIPW strategy is optimal because the upper bound for the probability of misidentification matches the semi lower bound when the budget goes to infinity.
- Score: 10.915684166086026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study best-arm identification with a fixed budget and contextual
(covariate) information in stochastic multi-armed bandit problems. In each
round, after observing contextual information, we choose a treatment arm using
past observations and current context. Our goal is to identify the best
treatment arm, a treatment arm with the maximal expected reward marginalized
over the contextual distribution, with a minimal probability of
misidentification. First, we derive semiparametric lower bounds for this
problem, where we regard the gaps between the expected rewards of the best and
suboptimal treatment arms as parameters of interest, and all other parameters,
such as the expected rewards conditioned on contexts, as the nuisance
parameters. We then develop the "Contextual RS-AIPW strategy," which consists
of the random sampling (RS) rule tracking a target allocation ratio and the
recommendation rule using the augmented inverse probability weighting (AIPW)
estimator. Our proposed Contextual RS-AIPW strategy is optimal because the
upper bound for the probability of misidentification matches the semiparametric
lower bound when the budget goes to infinity, and the gaps converge to zero.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget [6.22018632187078]
We introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget.
The goal is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$.
We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$.
arXiv Detail & Related papers (2024-05-23T22:35:11Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Optimal Best Arm Identification with Fixed Confidence in Restless Bandits [66.700654953613]
We study best arm identification in a restless multi-armed bandit setting with finitely many arms.
The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space.
It is demonstrated that tracking the long-term behavior of a certain Markov decision process and its state-action visitation proportions are the key ingredients in analyzing the converse and achievability bounds.
arXiv Detail & Related papers (2023-10-20T10:04:05Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Best Arm Identification for Stochastic Rising Bandits [84.55453174601826]
Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected.
This paper focuses on the fixed-budget Best Arm Identification (BAI) problem for SRBs.
We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE and R-SR.
arXiv Detail & Related papers (2023-02-15T08:01:37Z) - Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget [4.226118870861363]
We consider a constrained, pure exploration, multi-armed bandit formulation under a fixed budget.
We propose an algorithm called textscConstrained-SR based on the Successive Rejects framework.
We show that the associated decay rate is nearly optimal relative to an information theoretic lower bound in certain special cases.
arXiv Detail & Related papers (2022-11-27T08:58:16Z) - Beyond the Best: Estimating Distribution Functionals in Infinite-Armed
Bandits [40.71199236098642]
In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution.
We consider a general class of distribution functionals beyond the maximum, and propose unified meta algorithms for both the offline and online settings.
arXiv Detail & Related papers (2022-11-01T18:20:10Z) - Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse
Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances [27.122181278234617]
We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances.
We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws.
We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
arXiv Detail & Related papers (2022-01-12T13:38:33Z) - The Role of Contextual Information in Best Arm Identification [13.651941268805693]
We study the best-arm identification problem with fixed confidence when contextual information is available in bandits.
We show the instance-specific sample complexity lower bounds for the problem.
We experimentally confirm that context information contributes to faster best-arm identification.
arXiv Detail & Related papers (2021-06-26T18:39:38Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.