Contextual Combinatorial Bandits with Probabilistically Triggered Arms
- URL: http://arxiv.org/abs/2303.17110v2
- Date: Wed, 14 Jun 2023 09:00:10 GMT
- Title: Contextual Combinatorial Bandits with Probabilistically Triggered Arms
- Authors: Xutong Liu, Jinhang Zuo, Siwei Wang, John C.S. Lui, Mohammad
Hajiesmaili, Adam Wierman, Wei Chen
- Abstract summary: We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions.
Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
- Score: 45.305256056479045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study contextual combinatorial bandits with probabilistically triggered
arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide
range of applications, such as contextual cascading bandits and contextual
influence maximization bandits. Under the triggering probability modulated
(TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel
analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a
potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the
dimension of contexts, $p_{\min}$ is the minimum positive probability that any
arm can be triggered, and batch-size $K$ is the maximum number of arms that can
be triggered per round. Under the variance modulated (VM) or triggering
probability and variance modulated (TPVM) conditions, we propose a new
variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound
$\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a
valuable by-product, our analysis technique and variance-adaptive algorithm can
be applied to the CMAB-T and C$^2$MAB setting, improving existing results there
as well. We also include experiments that demonstrate the improved performance
of our algorithms compared with benchmark algorithms on synthetic and
real-world datasets.
Related papers
- Combinatorial Logistic Bandits [30.829239785016934]
We introduce a novel framework called logistic bandits (CLogB)
In each round, a subset of base arms (called the super arm) is selected, with the outcome of each base arm being binary.
Experiments on real-world datasets demonstrate the superior performance of our algorithms compared to benchmark algorithms.
arXiv Detail & Related papers (2024-10-22T14:52:46Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with
Probabilistically Triggered Arms or Independent Arms [53.89752069984382]
We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound.
First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals.
Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
arXiv Detail & Related papers (2022-08-31T13:09:39Z) - Adversarially Robust Multi-Armed Bandit Algorithm with
Variance-Dependent Regret Bounds [34.37963000493442]
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both adversarial settings.
arXiv Detail & Related papers (2022-06-14T12:58:46Z) - Contextual Combinatorial Volatile Bandits via Gaussian Processes [10.312968200748116]
We consider a contextual bandit problem with a set of available base arms and their contexts.
We propose an algorithm called Optimistic Combinatorial Learning and Optimization with Kernel Upper Confidence Bounds (O'CLOK-UCB)
We experimentally show that both algorithms vastly outperform the previous state-of-the-art UCB-based algorithms in realistic setups.
arXiv Detail & Related papers (2021-10-05T18:02:10Z) - Problem Dependent View on Structured Thresholding Bandit Problems [73.70176003598449]
We investigate the problem dependent regime in the Thresholding Bandit problem (TBP)
The objective of the learner is to output, at the end of a sequential game, the set of arms whose means are above a given threshold.
We provide upper and lower bounds for the probability of error in both the concave and monotone settings.
arXiv Detail & Related papers (2021-06-18T15:01:01Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.