A Novel Confidence-Based Algorithm for Structured Bandits
- URL: http://arxiv.org/abs/2005.11593v1
- Date: Sat, 23 May 2020 19:52:44 GMT
- Title: A Novel Confidence-Based Algorithm for Structured Bandits
- Authors: Andrea Tirinzoni, Alessandro Lazaric, Marcello Restelli
- Abstract summary: We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms.
We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
- Score: 129.30402124516507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study finite-armed stochastic bandits where the rewards of each arm might
be correlated to those of other arms. We introduce a novel phased algorithm
that exploits the given structure to build confidence sets over the parameters
of the true bandit problem and rapidly discard all sub-optimal arms. In
particular, unlike standard bandit algorithms with no structure, we show that
the number of times a suboptimal arm is selected may actually be reduced thanks
to the information collected by pulling other arms. Furthermore, we show that,
in some structures, the regret of an anytime extension of our algorithm is
uniformly bounded over time. For these constant-regret structures, we also
derive a matching lower bound. Finally, we demonstrate numerically that our
approach better exploits certain structures than existing methods.
Related papers
- An Algorithm for Fixed Budget Best Arm Identification with Combinatorial Exploration [3.9901365062418312]
We consider the best arm identification problem in the $K-$armed bandit framework.
Agent is allowed to play a subset of arms at each time slot instead of one arm.
We propose an algorithm that constructs $log K$ groups and performs a likelihood ratio test to detect the presence of the best arm.
arXiv Detail & Related papers (2025-02-03T15:10:08Z) - The Batch Complexity of Bandit Pure Exploration [10.036727981085223]
In a pure exploration problem in multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions.
We are interested in batched methods, which change their sampling behaviour only a few times, between batches of observations.
We give an instance-dependent lower bound on the number of batches used by any sample efficient algorithm for any pure exploration task.
arXiv Detail & Related papers (2025-02-03T15:03:45Z) - A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
Our refined analysis uncovers a novel bound on the speed at which the average number of arm pulls of our algorithm converges to the fundamental limit as $delta$ vanishes.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Combinatorial Blocking Bandits with Stochastic Delays [33.65025386998747]
Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling.
In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints.
We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms.
When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
arXiv Detail & Related papers (2021-05-22T02:46:04Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Online Model Selection: a Rested Bandit Formulation [49.69377391589057]
We introduce and analyze a best arm identification problem in the rested bandit setting.
We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game.
Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function.
arXiv Detail & Related papers (2020-12-07T08:23:08Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - On Regret with Multiple Best Arms [12.315392649501101]
We study a regret problem with the existence of multiple best/near-optimal arms in a bandit setting.
Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem.
arXiv Detail & Related papers (2020-06-26T04:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.