Forced Exploration in Bandit Problems
- URL: http://arxiv.org/abs/2312.07285v2
- Date: Wed, 13 Dec 2023 03:56:26 GMT
- Title: Forced Exploration in Bandit Problems
- Authors: Han Qi, Fei Guo, Li Zhu
- Abstract summary: The multi-armed bandit(MAB) is a classical sequential decision problem.
This paper aims to design a multi-armed bandit algorithm that can be implemented without using information about the reward distribution.
- Score: 12.13966146283641
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The multi-armed bandit(MAB) is a classical sequential decision problem. Most
work requires assumptions about the reward distribution (e.g., bounded), while
practitioners may have difficulty obtaining information about these
distributions to design models for their problems, especially in non-stationary
MAB problems. This paper aims to design a multi-armed bandit algorithm that can
be implemented without using information about the reward distribution while
still achieving substantial regret upper bounds. To this end, we propose a
novel algorithm alternating between greedy rule and forced exploration. Our
method can be applied to Gaussian, Bernoulli and other subgaussian
distributions, and its implementation does not require additional information.
We employ a unified analysis method for different forced exploration strategies
and provide problem-dependent regret upper bounds for stationary and
piecewise-stationary settings. Furthermore, we compare our algorithm with
popular bandit algorithms on different reward distributions.
Related papers
- Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Implicitly normalized forecaster with clipping for linear and non-linear
heavy-tailed multi-armed bandits [85.27420062094086]
Implicitly Normalized Forecaster (INF) is considered an optimal solution for adversarial multi-armed bandit (MAB) problems.
We propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INFclip) for MAB problems with heavy-tailed settings.
We demonstrate that INFclip is optimal for linear heavy-tailed MAB problems and works well for non-linear ones.
arXiv Detail & Related papers (2023-05-11T12:00:43Z) - Thompson Sampling on Asymmetric $\alpha$-Stable Bandits [0.0]
Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution.
Thompson Sampling is a common method for solving multi-armed bandit problem.
arXiv Detail & Related papers (2022-03-19T01:55:08Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - Towards Fundamental Limits of Multi-armed Bandits with Random Walk
Feedback [27.153454425433296]
We consider a new Multi-Armed Bandit (MAB) problem where arms are nodes in an unknown and possibly changing graph.
We provide a comprehensive understanding of this problem by studying both the trajectories and the adversarial setting.
arXiv Detail & Related papers (2020-11-03T03:21:22Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.