Related papers: Beyond the Best: Estimating Distribution Functionals in Infinite-Armed Bandits

Beyond the Best: Estimating Distribution Functionals in Infinite-Armed Bandits

URL: http://arxiv.org/abs/2211.01743v1
Date: Tue, 1 Nov 2022 18:20:10 GMT
Title: Beyond the Best: Estimating Distribution Functionals in Infinite-Armed Bandits
Authors: Yifei Wang, Tavor Baharav, Yanjun Han, Jiantao Jiao, David Tse
Abstract summary: In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution. We consider a general class of distribution functionals beyond the maximum, and propose unified meta algorithms for both the offline and online settings.
Score: 40.71199236098642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution, and each arm can be sampled further to obtain noisy estimates of the average reward of that arm. Prior work focuses on identifying the best arm, i.e., estimating the maximum of the average reward distribution. We consider a general class of distribution functionals beyond the maximum, and propose unified meta algorithms for both the offline and online settings, achieving optimal sample complexities. We show that online estimation, where the learner can sequentially choose whether to sample a new or existing arm, offers no advantage over the offline setting for estimating the mean functional, but significantly reduces the sample complexity for other functionals such as the median, maximum, and trimmed mean. The matching lower bounds utilize several different Wasserstein distances. For the special case of median estimation, we identify a curious thresholding phenomenon on the indistinguishability between Gaussian convolutions with respect to the noise level, which may be of independent interest.

Related papers

Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection. The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error. We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z)
Minimax Optimal Simple Regret in Two-Armed Best-Arm Identification [10.470114319701576]
We prove the minimax optimality of the Neyman allocation for the simple regret. We show that our optimality result holds without imposing locality restrictions on the local normality.
arXiv Detail & Related papers (2024-12-23T18:06:20Z)
Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits [35.35226227009685]
Good arm identification (IGA) is a practical bandit inference objective that aims to label arms with means above a threshold as quickly as possible. We show that GA can be efficiently solved by combining a reward-maximizing sampling algorithm with a novel non-valid sequential test for labeling arm means. Our empirical results validate our approach beyond the minimax setting, reducing the expected number of samples for all stopping times by at least 50% across both synthetic and real-world settings.
arXiv Detail & Related papers (2024-10-21T01:19:23Z)
Learning Counterfactual Distributions via Kernel Nearest Neighbors [8.971989179518216]
We introduce a kernel based distributional generalization of nearest neighbors to estimate the underlying distributions. We demonstrate that our nearest neighbors approach is robust to heteroscedastic noise, provided we have access to two or more measurements.
arXiv Detail & Related papers (2024-10-17T09:36:01Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
Best Arm Identification in Bandits with Limited Precision Sampling [14.011731120150124]
We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection. We propose a modified tracking-based algorithm to handle non-unique optimal allocations.
arXiv Detail & Related papers (2023-05-10T12:07:48Z)
Federated Best Arm Identification with Heterogeneous Clients [62.36929749450298]
We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients. We show that for any algorithm whose upper bound on the expected stopping time matches with the lower bound up to a multiplicative constant (em almost-optimal algorithm) We propose a novel algorithm that communicates at exponential time instants, and demonstrate that it is almost-optimal.
arXiv Detail & Related papers (2022-10-14T13:09:11Z)
Semiparametric Best Arm Identification with Contextual Information [10.915684166086026]
We study best-arm identification with a fixed budget and contextual information in bandit problems. We develop the "Contextual RS-AIPW strategy," which consists of the random sampling rule tracking a target allocation ratio and the recommendation rule. Our proposed Contextual RS-AIPW strategy is optimal because the upper bound for the probability of misidentification matches the semi lower bound when the budget goes to infinity.
arXiv Detail & Related papers (2022-09-15T14:38:47Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.