Robust Best-arm Identification in Linear Bandits
- URL: http://arxiv.org/abs/2311.04731v1
- Date: Wed, 8 Nov 2023 14:58:11 GMT
- Title: Robust Best-arm Identification in Linear Bandits
- Authors: Wei Wang, Sattar Vakili, Ilija Bogunovic
- Abstract summary: We study the robust best-arm identification problem (RBAI) in the case of linear rewards.
We present an instance-dependent lower bound for the robust best-arm identification problem with linear rewards.
Our algorithms prove to be effective in identifying robust dosage values across various age ranges of patients.
- Score: 25.91361349646875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the robust best-arm identification problem (RBAI) in the case of
linear rewards. The primary objective is to identify a near-optimal robust arm,
which involves selecting arms at every round and assessing their robustness by
exploring potential adversarial actions. This approach is particularly relevant
when utilizing a simulator and seeking to identify a robust solution for
real-world transfer. To this end, we present an instance-dependent lower bound
for the robust best-arm identification problem with linear rewards.
Furthermore, we propose both static and adaptive bandit algorithms that achieve
sample complexity that matches the lower bound. In synthetic experiments, our
algorithms effectively identify the best robust arm and perform similarly to
the oracle strategy. As an application, we examine diabetes care and the
process of learning insulin dose recommendations that are robust with respect
to inaccuracies in standard calculators. Our algorithms prove to be effective
in identifying robust dosage values across various age ranges of patients.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - SPRT-based Efficient Best Arm Identification in Stochastic Bandits [31.359578768463752]
This paper investigates the best arm identification problem in multi-armed bandits in the fixed confidence setting.
Existing algorithms for the exponential family of bandits face computational challenges.
A framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing.
arXiv Detail & Related papers (2022-07-22T15:54:53Z) - Active Learning with Safety Constraints [25.258564629480063]
We investigate the complexity of learning the best safe decision in interactive environments.
We propose an adaptive experimental design-based algorithm, which we show efficiently trades off between the difficulty of showing an arm is unsafe vs suboptimal.
arXiv Detail & Related papers (2022-06-22T15:45:38Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.