Tight Lower Bounds for Combinatorial Multi-Armed Bandits
- URL: http://arxiv.org/abs/2002.05392v3
- Date: Thu, 16 Jul 2020 11:26:58 GMT
- Title: Tight Lower Bounds for Combinatorial Multi-Armed Bandits
- Authors: Nadav Merlis, Shie Mannor
- Abstract summary: The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round.
We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
- Score: 72.56064196252498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Combinatorial Multi-Armed Bandit problem is a sequential decision-making
problem in which an agent selects a set of arms on each round, observes
feedback for each of these arms and aims to maximize a known reward function of
the arms it chose. While previous work proved regret upper bounds in this
setting for general reward functions, only a few works provided matching lower
bounds, all for specific reward functions. In this work, we prove regret lower
bounds for combinatorial bandits that hold under mild assumptions for all
smooth reward functions. We derive both problem-dependent and
problem-independent bounds and show that the recently proposed Gini-weighted
smoothness parameter (Merlis and Mannor, 2019) also determines the lower bounds
for monotone reward functions. Notably, this implies that our lower bounds are
tight up to log-factors.
Related papers
- Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits.
We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Regret Lower Bounds in Multi-agent Multi-armed Bandit [14.822625665220068]
Multi-armed Bandit motivates methods with provable upper bounds on regret.
We provide the first comprehensive study on regret lower bounds across different settings.
arXiv Detail & Related papers (2023-08-15T21:20:24Z) - Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms [32.313813562222066]
We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arm setting.
Each shareable arm has a finite reward capacity and a ''per-load'' reward distribution.
We propose an online learning algorithm to address the problem and prove its regret upper bound.
arXiv Detail & Related papers (2022-06-17T13:47:27Z) - Contextual Combinatorial Multi-output GP Bandits with Group Constraints [11.317136648551537]
In federated multi-armed bandit problems, maximizing global reward while satisfying minimum privacy requirements to protect clients is the main goal.
We consider a contextual bandit setting with groups and changing action sets, where similar base arms arrive in groups and a set of base arms, called a super arm, must be chosen in each round to maximize super arm reward while satisfying the constraints of the rewards of groups from which base arms were chosen.
We then propose a novel double-UCB GP-bandit algorithm, called Thresholded Combinatored Upper Confidence Bounds (TCGP-UCB), which balances between maximizing cumulative super arm reward and satisfying
arXiv Detail & Related papers (2021-11-29T18:39:09Z) - Max-Min Grouped Bandits [48.62520520818357]
We introduce a multi-armed bandit problem termed max-min grouped bandits.
The goal is to find a group whose worst arm has the highest mean reward.
This problem is of interest to applications such as recommendation systems.
arXiv Detail & Related papers (2021-11-17T01:59:15Z) - Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits [66.02233330016435]
We provide problem-dependent guarantees on both the regret and the asked feedback.
We present a new algorithm called BuFALU for which we derive problem-dependent regret and cumulative feedback bounds.
arXiv Detail & Related papers (2021-10-12T03:24:57Z) - Combinatorial Blocking Bandits with Stochastic Delays [33.65025386998747]
Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling.
In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints.
We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms.
When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
arXiv Detail & Related papers (2021-05-22T02:46:04Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.