Tuning Confidence Bound for Stochastic Bandits with Bandit Distance
- URL: http://arxiv.org/abs/2110.02690v1
- Date: Wed, 6 Oct 2021 12:24:07 GMT
- Title: Tuning Confidence Bound for Stochastic Bandits with Bandit Distance
- Authors: Xinyu Zhang, Srinjoy Das, Ken Kreutz-Delgado
- Abstract summary: "Distance tuning" of the standard UCB is done using a proposed distance measure.
"Exploration Bargain Point" gives insights into the tradeoffs between exploration and exploitation.
- Score: 5.818764911456228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel modification of the standard upper confidence bound (UCB)
method for the stochastic multi-armed bandit (MAB) problem which tunes the
confidence bound of a given bandit based on its distance to others. Our UCB
distance tuning (UCB-DT) formulation enables improved performance as measured
by expected regret by preventing the MAB algorithm from focusing on non-optimal
bandits which is a well-known deficiency of standard UCB. "Distance tuning" of
the standard UCB is done using a proposed distance measure, which we call
bandit distance, that is parameterizable and which therefore can be optimized
to control the transition rate from exploration to exploitation based on
problem requirements. We empirically demonstrate increased performance of
UCB-DT versus many existing state-of-the-art methods which use the UCB
formulation for the MAB problem. Our contribution also includes the development
of a conceptual tool called the "Exploration Bargain Point" which gives
insights into the tradeoffs between exploration and exploitation. We argue that
the Exploration Bargain Point provides an intuitive perspective that is useful
for comparatively analyzing the performance of UCB-based methods.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - UCB Exploration for Fixed-Budget Bayesian Best Arm Identification [0.0]
We study best-arm identification (BAI) in the fixed-budget setting.
We propose an UCB exploration algorithm that is both theoretically and empirically efficient for the fixed budget BAI problem under a Bayesian setting.
arXiv Detail & Related papers (2024-08-09T05:15:36Z) - Data-Driven Upper Confidence Bounds with Near-Optimal Regret for Heavy-Tailed Bandits [0.0]
We propose a new distribution-free, data-driven UCB algorithm for symmetric reward distributions.
We prove a near-optimal regret bound for the proposed anytime, parameter-free RMM-UCB method, even for heavy-tailed distributions.
arXiv Detail & Related papers (2024-06-09T10:06:50Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary
Contextual Bandits [16.59103967569845]
We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for contextual linear bandits in non-stationary environments.
This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings.
arXiv Detail & Related papers (2023-07-07T13:29:07Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Distributed Consensus Algorithm for Decision-Making in Multi-agent
Multi-armed Bandit [7.708904950194129]
We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment.
A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points.
The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step.
arXiv Detail & Related papers (2023-06-09T16:10:26Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Meta-Learning Adversarial Bandits [49.094361442409785]
We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure.
As the first to target the adversarial setting, we design a meta-algorithm that setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit optimization (BLO)
Our guarantees rely on proving that unregularized follow-the-leader combined with multiplicative weights is enough to online learn a non-smooth and non-B sequence.
arXiv Detail & Related papers (2022-05-27T17:40:32Z) - Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic
Systems [11.581678142944318]
We re-examine an approach called ''Reward Biased Maximum Likelihood Estimate'' (RBMLE)
We propose an Augmented RBMLE-UCB algorithm that combines the penalty of the RBMLE method with the constraints of the UCB method.
arXiv Detail & Related papers (2022-01-25T18:52:28Z) - Achieving the Pareto Frontier of Regret Minimization and Best Arm
Identification in Multi-Armed Bandits [91.8283876874947]
We design and analyze the BoBW-lil'UCB$(gamma)$ algorithm.
We show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives.
We also show that BoBW-lil'UCB$(gamma)$ outperforms a competitor in terms of the time complexity and the regret.
arXiv Detail & Related papers (2021-10-16T17:52:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.