Robust Outlier Arm Identification
- URL: http://arxiv.org/abs/2009.09988v1
- Date: Mon, 21 Sep 2020 16:13:01 GMT
- Title: Robust Outlier Arm Identification
- Authors: Yinglun Zhu, Sumeet Katariya and Robert Nowak
- Abstract summary: We study the problem of Robust Outlier Arm Identification (ROAI)
The goal is to identify arms whose expected rewards deviate substantially from the majority.
We compute the outlier threshold using the median and median absolute deviation of the expected rewards.
- Score: 16.21284542559277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of Robust Outlier Arm Identification (ROAI), where the
goal is to identify arms whose expected rewards deviate substantially from the
majority, by adaptively sampling from their reward distributions. We compute
the outlier threshold using the median and median absolute deviation of the
expected rewards. This is a robust choice for the threshold compared to using
the mean and standard deviation, since it can identify outlier arms even in the
presence of extreme outlier values. Our setting is different from existing pure
exploration problems where the threshold is pre-specified as a given value or
rank. This is useful in applications where the goal is to identify the set of
promising items but the cardinality of this set is unknown, such as finding
promising drugs for a new disease or identifying items favored by a population.
We propose two $\delta$-PAC algorithms for ROAI, which includes the first
UCB-style algorithm for outlier detection, and derive upper bounds on their
sample complexity. We also prove a matching, up to logarithmic factors, worst
case lower bound for the problem, indicating that our upper bounds are
generally unimprovable. Experimental results show that our algorithms are both
robust and about $5$x sample efficient compared to state-of-the-art.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - lil'HDoC: An Algorithm for Good Arm Identification under Small Threshold
Gap [4.666048091337632]
Good arm identification (GAI) is a pure-exploration bandit problem in which a single learner outputs an arm as soon as it is identified as a good arm.
This paper focuses on the GAI problem under a small threshold gap, which refers to the distance between the expected rewards of arms and the given threshold.
We propose a new algorithm called lil'HDoC to significantly improve the total sample complexity of the HDoC algorithm.
arXiv Detail & Related papers (2024-01-29T04:21:47Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Beyond the Best: Estimating Distribution Functionals in Infinite-Armed
Bandits [40.71199236098642]
In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution.
We consider a general class of distribution functionals beyond the maximum, and propose unified meta algorithms for both the offline and online settings.
arXiv Detail & Related papers (2022-11-01T18:20:10Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Adaptive Double-Exploration Tradeoff for Outlier Detection [31.428683644520046]
We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection.
The objective is to identify the outliers whose rewards are above a threshold.
By automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm.
arXiv Detail & Related papers (2020-05-13T00:12:31Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.