On Regret with Multiple Best Arms
- URL: http://arxiv.org/abs/2006.14785v2
- Date: Thu, 22 Oct 2020 14:55:32 GMT
- Title: On Regret with Multiple Best Arms
- Authors: Yinglun Zhu and Robert Nowak
- Abstract summary: We study a regret problem with the existence of multiple best/near-optimal arms in a bandit setting.
Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem.
- Score: 12.315392649501101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a regret minimization problem with the existence of multiple
best/near-optimal arms in the multi-armed bandit setting. We consider the case
when the number of arms/actions is comparable or much larger than the time
horizon, and make no assumptions about the structure of the bandit instance.
Our goal is to design algorithms that can automatically adapt to the unknown
hardness of the problem, i.e., the number of best arms. Our setting captures
many modern applications of bandit algorithms where the action space is
enormous and the information about the underlying instance/structure is
unavailable. We first propose an adaptive algorithm that is agnostic to the
hardness level and theoretically derive its regret bound. We then prove a lower
bound for our problem setting, which indicates: (1) no algorithm can be minimax
optimal simultaneously over all hardness levels; and (2) our algorithm achieves
a rate function that is Pareto optimal. With additional knowledge of the
expected reward of the best arm, we propose another adaptive algorithm that is
minimax optimal, up to polylog factors, over all hardness levels. Experimental
results confirm our theoretical guarantees and show advantages of our
algorithms over the previous state-of-the-art.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.