Contextual Combinatorial Multi-output GP Bandits with Group Constraints
- URL: http://arxiv.org/abs/2111.14778v2
- Date: Mon, 10 Jul 2023 15:11:08 GMT
- Title: Contextual Combinatorial Multi-output GP Bandits with Group Constraints
- Authors: Sepehr Elahi, Baran Atalar, Sevda \"O\u{g}\"ut, Cem Tekin
- Abstract summary: In federated multi-armed bandit problems, maximizing global reward while satisfying minimum privacy requirements to protect clients is the main goal.
We consider a contextual bandit setting with groups and changing action sets, where similar base arms arrive in groups and a set of base arms, called a super arm, must be chosen in each round to maximize super arm reward while satisfying the constraints of the rewards of groups from which base arms were chosen.
We then propose a novel double-UCB GP-bandit algorithm, called Thresholded Combinatored Upper Confidence Bounds (TCGP-UCB), which balances between maximizing cumulative super arm reward and satisfying
- Score: 11.317136648551537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In federated multi-armed bandit problems, maximizing global reward while
satisfying minimum privacy requirements to protect clients is the main goal. To
formulate such problems, we consider a combinatorial contextual bandit setting
with groups and changing action sets, where similar base arms arrive in groups
and a set of base arms, called a super arm, must be chosen in each round to
maximize super arm reward while satisfying the constraints of the rewards of
groups from which base arms were chosen. To allow for greater flexibility, we
let each base arm have two outcomes, modeled as the output of a two-output
Gaussian process (GP), where one outcome is used to compute super arm reward
and the other for group reward. We then propose a novel double-UCB GP-bandit
algorithm, called Thresholded Combinatorial Gaussian Process Upper Confidence
Bounds (TCGP-UCB), which balances between maximizing cumulative super arm
reward and satisfying group reward constraints and can be tuned to prefer one
over the other. We also define a new notion of regret that combines super arm
regret with group reward constraint satisfaction and prove that TCGP-UCB incurs
$\tilde{O}(\sqrt{\lambda^*(K)KT\overline{\gamma}_{T}} )$ regret with high
probability, where $\overline{\gamma}_{T}$ is the maximum information gain
associated with the set of base arm contexts that appeared in the first $T$
rounds and $K$ is the maximum super arm cardinality over all rounds. We lastly
show in experiments using synthetic and real-world data and based on a
federated learning setup as well as a content-recommendation one that our
algorithm performs better then the current non-GP state-of-the-art
combinatorial bandit algorithm, while satisfying group constraints.
Related papers
- A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Bayesian Analysis of Combinatorial Gaussian Process Bandits [6.594362025904486]
We provide novel cumulative regret bounds for three GP-based algorithms: GP-UCB, GP-BayesUCB and GP-TS.
We employ our framework to address the challenging real-world problem of online energy-efficient navigation.
arXiv Detail & Related papers (2023-12-20T00:31:43Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Max-Quantile Grouped Infinite-Arm Bandits [39.7239093424124]
bandit problem in which there are a number of groups each consisting of infinitely many arms.
We introduce a two-step algorithm that first requests a fixed number of arms from each group and then runs a finite-arm grouped max-quantile bandit algorithm.
We characterize both the instance-dependent and worst-case regret, and provide a matching lower bound for the latter.
arXiv Detail & Related papers (2022-10-04T00:46:49Z) - Risk-Aware Algorithms for Combinatorial Semi-Bandits [7.716156977428555]
We study the multi-armed bandit problem under semi-bandit feedback.
We consider the problem of maximizing the Conditional Value-at-Risk (CVaR), a risk measure that takes into account only the worst-case rewards.
We propose new algorithms that maximize the CVaR of the rewards obtained from the super arms of the bandit.
arXiv Detail & Related papers (2021-12-02T11:29:43Z) - Max-Min Grouped Bandits [48.62520520818357]
We introduce a multi-armed bandit problem termed max-min grouped bandits.
The goal is to find a group whose worst arm has the highest mean reward.
This problem is of interest to applications such as recommendation systems.
arXiv Detail & Related papers (2021-11-17T01:59:15Z) - Contextual Combinatorial Volatile Bandits via Gaussian Processes [10.312968200748116]
We consider a contextual bandit problem with a set of available base arms and their contexts.
We propose an algorithm called Optimistic Combinatorial Learning and Optimization with Kernel Upper Confidence Bounds (O'CLOK-UCB)
We experimentally show that both algorithms vastly outperform the previous state-of-the-art UCB-based algorithms in realistic setups.
arXiv Detail & Related papers (2021-10-05T18:02:10Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round.
We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
arXiv Detail & Related papers (2020-02-13T08:53:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.