Multi-User Contextual Cascading Bandits for Personalized Recommendation
- URL: http://arxiv.org/abs/2508.13981v2
- Date: Sun, 24 Aug 2025 18:18:21 GMT
- Title: Multi-User Contextual Cascading Bandits for Personalized Recommendation
- Authors: Jiho Park, Huiwen Jia,
- Abstract summary: Multi-User Contextual Cascading Bandit model captures realistic online advertising scenarios where multiple users interact with sequentially displayed items simultaneously.<n>We first propose Upper Confidence Bound with Backward Planning (UCBBP), a UCB-style algorithm tailored to this setting, and prove that it achieves a regret bound of $widetildeO(sqrtTHN)$ over $T$ episodes, $H$ session steps, and $N$ contexts per episode.<n>Motivated by the fact that many users interact with the system simultaneously, we introduce a second algorithm, termed Active Upper Confidence Bound with
- Score: 2.506355754272765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a Multi-User Contextual Cascading Bandit model, a new combinatorial bandit framework that captures realistic online advertising scenarios where multiple users interact with sequentially displayed items simultaneously. Unlike classical contextual bandits, MCCB integrates three key structural elements: (i) cascading feedback based on sequential arm exposure, (ii) parallel context sessions enabling selective exploration, and (iii) heterogeneous arm-level rewards. We first propose Upper Confidence Bound with Backward Planning (UCBBP), a UCB-style algorithm tailored to this setting, and prove that it achieves a regret bound of $\widetilde{O}(\sqrt{THN})$ over $T$ episodes, $H$ session steps, and $N$ contexts per episode. Motivated by the fact that many users interact with the system simultaneously, we introduce a second algorithm, termed Active Upper Confidence Bound with Backward Planning (AUCBBP), which shows a strict efficiency improvement in context scaling, i.e., user scaling, with a regret bound of $\widetilde{O}(\sqrt{T+HN})$. We validate our theoretical findings via numerical experiments, demonstrating the empirical effectiveness of both algorithms under various settings.
Related papers
- $β$-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment [53.42377319350806]
$$-CLIP is a multi-granular text-conditioned contrastive learning framework.<n>$$-CAL addresses the semantic overlap inherent in this hierarchy.<n>$$-CLIP establishes a robust, adaptive baseline for dense vision-language correspondence.
arXiv Detail & Related papers (2025-12-14T13:03:20Z) - Multi Activity Sequence Alignment via Implicit Clustering [50.3168866743067]
We propose a novel framework that overcomes limitations using sequence alignment via implicit clustering.<n>Specifically, our key idea is to perform implicit clip-level clustering while aligning frames in sequences.<n>Our experiments show that our proposed method outperforms state-of-the-art results.
arXiv Detail & Related papers (2025-03-16T14:28:46Z) - Optimal Best Arm Identification with Post-Action Context [15.613350380708798]
We introduce the problem of best arm identification with post-action context.<n>We analyze two different types of the post-action context.<n>For the non-separator setting, we do so by demonstrating that the Track-and-Stop algorithm can be extended to this setting.<n>For the separator setting, we propose a novel sampling rule called $textitG-tracking$, which uses the geometry of the context space to directly track the contexts rather than the actions.
arXiv Detail & Related papers (2025-02-05T10:47:05Z) - Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback.<n>We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot
Fine-grained Action Recognition [80.21796574234287]
M$3$Net is a matching-based framework for few-shot fine-grained (FS-FG) action recognition.
It incorporates textitmulti-view encoding, textitmulti-view matching, and textitmulti-view fusion to facilitate embedding encoding, similarity matching, and decision making.
Explainable visualizations and experimental results demonstrate the superiority of M$3$Net in capturing fine-grained action details.
arXiv Detail & Related papers (2023-08-06T09:15:14Z) - Efficient Explorative Key-term Selection Strategies for Conversational
Contextual Bandits [46.49854998602084]
We propose ConLinUCB", a general framework for conversational bandits with better information incorporation.
We also design two bandit algorithms with explorative key-term selection strategies, ConLinUCB-BS and ConLinUCB-MCR.
Experiments on synthetic and real-world data show significant advantages of our algorithms in learning accuracy (up to 54% improvement) and computational efficiency (up to 72% improvement)
arXiv Detail & Related papers (2023-03-01T08:24:54Z) - Dueling Bandits: From Two-dueling to Multi-dueling [40.4378338001229]
We study a general multi-dueling bandit problem, where an agent compares multiple options simultaneously and aims to minimize the regret due to selecting suboptimal arms.
This setting generalizes the traditional two-dueling bandit problem and finds many real-world applications involving subjective feedback on multiple options.
arXiv Detail & Related papers (2022-11-16T22:00:54Z) - Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms [59.8188496313214]
We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound.
First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals.
Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
arXiv Detail & Related papers (2022-08-31T13:09:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.