Related papers: Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances

Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances

URL: http://arxiv.org/abs/2306.07549v1
Date: Tue, 13 Jun 2023 05:41:38 GMT
Title: Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances
Authors: Anusha Lalitha, Kousha Kalantari, Yifei Ma, Anoop Deoras, Branislav Kveton
Abstract summary: We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances.
Score: 12.00630538470713
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances. Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than those with lower variances. The main algorithmic novelty is in the design of SHAdaVar, which allocates budget greedily based on overestimating the unknown reward variances. We bound probabilities of misidentifying the best arms in both SHVar and SHAdaVar. Our analyses rely on novel lower bounds on the number of pulls of an arm that do not require closed-form solutions to the budget allocation problem. Since one of our budget allocation problems is analogous to the optimal experiment design with unknown variances, we believe that our results are of a broad interest. Our experiments validate our theory, and show that SHVar and SHAdaVar outperform algorithms from prior works with analytical guarantees.

Related papers

Catoni Contextual Bandits are Robust to Heavy-tailed Rewards [31.381627608971414]
We develop an algorithmic approach building on Catoni's estimator from robust statistics. We establish a regret bound that depends only on the cumulative reward variance and logarithmically on the reward range $R$. Our algorithm also enjoys a variance-based bound with logarithmic reward-range dependence.
arXiv Detail & Related papers (2025-02-04T17:03:32Z)
Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget [6.22018632187078]
We introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget. The goal is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$. We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$.
arXiv Detail & Related papers (2024-05-23T22:35:11Z)
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z)
Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances [10.470114319701576]
We propose a strategy that estimates variances during an adaptive experiment and draws arms with a ratio of the estimated standard deviations. Our results suggest that under the worst-case scenario characterized by the small-gap regime, our strategy, which employs estimated variance, is optimalally even when the variances are unknown.
arXiv Detail & Related papers (2023-12-20T03:28:49Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a Fixed Budget [10.470114319701576]
This study investigates the experimental design problem for identifying the arm with the highest expected outcome. Under the assumption that the variances are known, we propose the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy. We show that the GNA-EBA strategy is infinitelyally optimal in sense that its probability of misidentification aligns with the lower bounds.
arXiv Detail & Related papers (2023-10-30T17:52:46Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM) We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z)
Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals [0.9831489366502302]
We study the Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from $K$ arms with unknown expected rewards and costs. We propose a new upper confidence bound (UCB) sampling policy, $omega$-UCB, that uses asymmetric confidence intervals. These intervals scale with the distance between the sample mean and the bounds of a random variable, yielding a more accurate and tight estimation of the reward-cost ratio.
arXiv Detail & Related papers (2023-06-12T12:35:16Z)
Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling [44.921905700729766]
Most bandit algorithms assume that the reward variances or their upper bounds are known, and that they are the same for all arms. This motivated prior works on variance-adaptive frequentist algorithms, which have strong instance-dependent regret bounds. We lay foundations for the Bayesian setting, which incorporates prior knowledge.
arXiv Detail & Related papers (2023-03-16T02:07:29Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret. We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z)
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization [50.725191156128645]
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments. We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
arXiv Detail & Related papers (2021-12-15T22:11:58Z)
Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB. We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits. Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.