Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback
- URL: http://arxiv.org/abs/2101.08534v1
- Date: Thu, 21 Jan 2021 10:35:09 GMT
- Title: Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback
- Authors: Marc Jourdan, Mojm\'ir Mutn\'y, Johannes Kirschner, Andreas Krause
- Abstract summary: Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
- Score: 51.21673420940346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Combinatorial bandits with semi-bandit feedback generalize multi-armed
bandits, where the agent chooses sets of arms and observes a noisy reward for
each arm contained in the chosen set. The action set satisfies a given
structure such as forming a base of a matroid or a path in a graph. We focus on
the pure-exploration problem of identifying the best arm with fixed confidence,
as well as a more general setting, where the structure of the answer set
differs from the one of the action set. Using the recently popularized game
framework, we interpret this problem as a sequential zero-sum game and develop
a CombGame meta-algorithm whose instances are asymptotically optimal algorithms
with finite time guarantees. In addition to comparing two families of learners
to instantiate our meta-algorithm, the main contribution of our work is a
specific oracle efficient instance for best-arm identification with
combinatorial actions. Based on a projection-free online learning algorithm for
convex polytopes, it is the first computationally efficient algorithm which is
asymptotically optimal and has competitive empirical performance.
Related papers
- Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed
Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large.
We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent.
We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - On Regret with Multiple Best Arms [12.315392649501101]
We study a regret problem with the existence of multiple best/near-optimal arms in a bandit setting.
Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem.
arXiv Detail & Related papers (2020-06-26T04:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.