Simple Modification of the Upper Confidence Bound Algorithm by
Generalized Weighted Averages
- URL: http://arxiv.org/abs/2308.14350v1
- Date: Mon, 28 Aug 2023 06:53:31 GMT
- Title: Simple Modification of the Upper Confidence Bound Algorithm by
Generalized Weighted Averages
- Authors: Nobuhito Manome, Shuji Shinohara, Ung-il Chung
- Abstract summary: The multi-armed bandit (MAB) problem is a classical problem that models sequential decision-making under uncertainty in reinforcement learning.
We propose a new generalized upper confidence bound (UCB) algorithm (GWA-UCB1) by extending UCB1, which is a representative algorithm for MAB problems.
GWA-UCB1 outperformed G-UCB1, UCB1-Tuned, and Thompson sampling in most problem settings and can be useful in many situations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The multi-armed bandit (MAB) problem is a classical problem that models
sequential decision-making under uncertainty in reinforcement learning. In this
study, we propose a new generalized upper confidence bound (UCB) algorithm
(GWA-UCB1) by extending UCB1, which is a representative algorithm for MAB
problems, using generalized weighted averages, and present an effective
algorithm for various problem settings. GWA-UCB1 is a two-parameter
generalization of the balance between exploration and exploitation in UCB1 and
can be implemented with a simple modification of the UCB1 formula. Therefore,
this algorithm can be easily applied to UCB-based reinforcement learning
models. In preliminary experiments, we investigated the optimal parameters of a
simple generalized UCB1 (G-UCB1), prepared for comparison and GWA-UCB1, in a
stochastic MAB problem with two arms. Subsequently, we confirmed the
performance of the algorithms with the investigated parameters on stochastic
MAB problems when arm reward probabilities were sampled from uniform or normal
distributions and on survival MAB problems assuming more realistic situations.
GWA-UCB1 outperformed G-UCB1, UCB1-Tuned, and Thompson sampling in most problem
settings and can be useful in many situations. The code is available at
https://github.com/manome/python-mab.
Related papers
- UCB for Large-Scale Pure Exploration: Beyond Sub-Gaussianity [0.8283940114367679]
This paper investigates the performance of upper confidence bound (UCB) algorithms in large-scale pure exploration problems.<n>We show that the meta-UCB algorithm and therefore a broad class of UCB algorithms can achieve the sample optimality.<n>Results demonstrate the applicability of UCB algorithms for solving large-scale pure exploration problems with non-sub-Gaussian distributions.
arXiv Detail & Related papers (2025-11-27T09:54:22Z) - Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS)
For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium.
We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.
We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs.
We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Genetic multi-armed bandits: a reinforcement learning approach for
discrete optimization via simulation [0.0]
This paper proposes a new algorithm, that combines the reinforcement learning domain of multi-armed bandits and random search strategies to solve discrete optimization problems via simulation.
Our aim is to combine the property of multi-armed bandits with the ability of genetic algorithms to handle high-dimensional solution spaces.
arXiv Detail & Related papers (2023-02-15T14:46:19Z) - An Application of a Multivariate Estimation of Distribution Algorithm to
Cancer Chemotherapy [59.40521061783166]
Chemotherapy treatment for cancer is a complex optimisation problem with a large number of interacting variables and constraints.
We show that the more sophisticated algorithm would yield better performance on a complex problem like this.
We hypothesise that this is caused by the more sophisticated algorithm being impeded by the large number of interactions in the problem.
arXiv Detail & Related papers (2022-05-17T15:28:46Z) - Fixed-Budget Best-Arm Identification in Structured Bandits [33.27743152847947]
Best-arm identification (BAI) in a fixed-budget setting is a bandit problem where the learning agent maximizes the probability of identifying the optimal (best) arm after a fixed number of observations.
We propose a general tractable algorithm that incorporates the structure, by eliminating suboptimal arms based on their mean reward estimates from a joint generalization model.
arXiv Detail & Related papers (2021-06-09T01:32:43Z) - A Closer Look at the Worst-case Behavior of Multi-armed Bandit
Algorithms [8.099977107670918]
Upper Confidence Bound (UCB) is an optimistic-based MAB algorithm.
This paper provides new results on the arm-sampling behavior of UCB.
It also provides the first process-level characterization of the MAB problem under UCB.
arXiv Detail & Related papers (2021-06-03T20:52:26Z) - Multi-armed Bandits with Cost Subsidy [1.6631602844999724]
We present two applications, intelligent SMS routing problem and ad audience optimization problem.
We show that naive generalizations of existing MAB algorithms do not perform well for this problem.
We also present a simple variant of explore-then-commit and establish near-optimal regret bounds for this algorithm.
arXiv Detail & Related papers (2020-11-03T05:38:42Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.