Related papers: Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms

URL: http://arxiv.org/abs/2208.14837v3
Date: Mon, 18 Nov 2024 17:30:05 GMT
Title: Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms
Authors: Xutong Liu, Jinhang Zuo, Siwei Wang, Carlee Joe-Wong, John C. S. Lui, Wei Chen,
Abstract summary: We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
Score: 59.8188496313214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound, where $K$ is the total number of arms that can be pulled or triggered in each round. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we discover a novel (directional) triggering probability and variance modulated (TPVM) condition that can replace the previously-used smoothness condition for various applications, such as cascading bandits, online network exploration and online influence maximization. Under this new condition, we propose a BCUCB-T algorithm with variance-aware confidence intervals and conduct regret analysis which reduces the $O(K)$ factor to $O(\log K)$ or $O(\log^2 K)$ in the regret bound, significantly improving the regret bounds for the above applications. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition and completely removes the dependency on $K$ in the leading regret. As a valuable by-product, the regret analysis used in this paper can improve several existing results by a factor of $O(\log K)$. Finally, experimental evaluations show our superior performance compared with benchmark algorithms in different applications.

Related papers

Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits [12.655523567240042]
Minimax Optimal Strategy (CMOSS) achieves an instance-independent regret of $Obig(log k)2sqrtkmTbig )$ under semi-bandit feedback.<n>We show that CMOSS consistently outperforms benchmark algorithms in both regret runtime and efficiency.
arXiv Detail & Related papers (2025-08-08T12:01:50Z)
Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Combinatorial Logistic Bandits [30.829239785016934]
We introduce a novel framework called logistic bandits (CLogB)<n>In each round, a subset of base arms (called the super arm) is selected, with the outcome of each base arm being binary.<n>Experiments on real-world datasets demonstrate the superior performance of our algorithms compared to benchmark algorithms.
arXiv Detail & Related papers (2024-10-22T14:52:46Z)
Fixed-Budget Differentially Private Best Arm Identification [62.36929749450298]
We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints. We derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$.
arXiv Detail & Related papers (2024-01-17T09:23:25Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM) We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z)
Multi-Fidelity Multi-Armed Bandits Revisited [46.19926456682379]
We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of the canonical multi-armed bandit (MAB) problem. MF-MAB allows each arm to be pulled with different costs (fidelities) and observation accuracy.
arXiv Detail & Related papers (2023-06-13T13:19:20Z)
Combinatorial Bandits for Maximum Value Reward Function under Max Value-Index Feedback [9.771002043127728]
We consider a multi-armed bandit problem for maximum value reward function under maximum value and index feedback. We propose an algorithm and provide a regret bound for problem instances with arm outcomes according to arbitrary distributions with finite supports. Our algorithm achieves a $O((k/Delta)log(T))$ distribution-dependent and a $tildeO(sqrtT)$ distribution-independent regret.
arXiv Detail & Related papers (2023-05-25T14:02:12Z)
Contextual Combinatorial Bandits with Probabilistically Triggered Arms [55.9237004478033]
We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions. Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
arXiv Detail & Related papers (2023-03-30T02:51:00Z)
Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds [34.37963000493442]
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both adversarial settings.
arXiv Detail & Related papers (2022-06-14T12:58:46Z)
Achieving the Pareto Frontier of Regret Minimization and Best Arm Identification in Multi-Armed Bandits [91.8283876874947]
We design and analyze the BoBW-lil'UCB$(gamma)$ algorithm. We show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives. We also show that BoBW-lil'UCB$(gamma)$ outperforms a competitor in terms of the time complexity and the regret.
arXiv Detail & Related papers (2021-10-16T17:52:32Z)
Contextual Combinatorial Volatile Bandits via Gaussian Processes [10.312968200748116]
We consider a contextual bandit problem with a set of available base arms and their contexts. We propose an algorithm called Optimistic Combinatorial Learning and Optimization with Kernel Upper Confidence Bounds (O'CLOK-UCB) We experimentally show that both algorithms vastly outperform the previous state-of-the-art UCB-based algorithms in realistic setups.
arXiv Detail & Related papers (2021-10-05T18:02:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.