Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed
Bandits
- URL: http://arxiv.org/abs/2008.13629v2
- Date: Mon, 28 Mar 2022 02:49:25 GMT
- Title: Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed
Bandits
- Authors: Anmol Kagrecha, Jayakrishnan Nair, and Krishna Jagannathan
- Abstract summary: We show that specialized algorithms that exploit such parametric information are prone to inconsistent learning performance when the parameter is misspecified.
Our key contributions are: (i) We establish fundamental performance limits of statistically robust MAB algorithms under the fixed-budget pure exploration setting, and (ii) We propose two classes of algorithms that are twofoldly near-optimal.
- Score: 4.760079434948198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional multi-armed bandit (MAB) formulations usually make certain
assumptions about the underlying arms' distributions, such as bounds on the
support or their tail behaviour. Moreover, such parametric information is
usually 'baked' into the algorithms. In this paper, we show that specialized
algorithms that exploit such parametric information are prone to inconsistent
learning performance when the parameter is misspecified. Our key contributions
are twofold: (i) We establish fundamental performance limits of statistically
robust MAB algorithms under the fixed-budget pure exploration setting, and (ii)
We propose two classes of algorithms that are asymptotically near-optimal.
Additionally, we consider a risk-aware criterion for best arm identification,
where the objective associated with each arm is a linear combination of the
mean and the conditional value at risk (CVaR). Throughout, we make a very mild
'bounded moment' assumption, which lets us work with both light-tailed and
heavy-tailed distributions within a unified framework.
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget [4.226118870861363]
We consider a constrained, pure exploration, multi-armed bandit formulation under a fixed budget.
We propose an algorithm called textscConstrained-SR based on the Successive Rejects framework.
We show that the associated decay rate is nearly optimal relative to an information theoretic lower bound in certain special cases.
arXiv Detail & Related papers (2022-11-27T08:58:16Z) - Robust Contextual Linear Bandits [19.85979744859435]
This paper studies a common form of misspecification, an inter-arm heterogeneity that is not captured by context.
We develop two efficient bandit algorithms for our setting: a UCB algorithm called RoLinUCB and a posterior-sampling algorithm called RoLinTS.
arXiv Detail & Related papers (2022-10-26T05:18:09Z) - Best Arm Identification under Additive Transfer Bandits [49.69203462561861]
We study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances.
We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability.
arXiv Detail & Related papers (2021-12-08T02:20:18Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards.
The goal is to identify the arm with the highest quantile at a fixed, prescribed level.
We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.