Online Clustering of Dueling Bandits
- URL: http://arxiv.org/abs/2502.02079v1
- Date: Tue, 04 Feb 2025 07:55:41 GMT
- Title: Online Clustering of Dueling Bandits
- Authors: Zhiyong Wang, Jiahang Sun, Mingze Kong, Jize Xie, Qinghua Hu, John C. S. Lui, Zhongxiang Dai,
- Abstract summary: We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback.
We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
- Score: 59.09590979404303
- License:
- Abstract: The contextual multi-armed bandit (MAB) is a widely used framework for problems requiring sequential decision-making under uncertainty, such as recommendation systems. In applications involving a large number of users, the performance of contextual MAB can be significantly improved by facilitating collaboration among multiple users. This has been achieved by the clustering of bandits (CB) methods, which adaptively group the users into different clusters and achieve collaboration by allowing the users in the same cluster to share data. However, classical CB algorithms typically rely on numerical reward feedback, which may not be practical in certain real-world applications. For instance, in recommendation systems, it is more realistic and reliable to solicit preference feedback between pairs of recommended items rather than absolute rewards. To address this limitation, we introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback. We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions. Both algorithms are supported by rigorous theoretical analyses, demonstrating that user collaboration leads to improved regret bounds. Extensive empirical evaluations on synthetic and real-world datasets further validate the effectiveness of our methods, establishing their potential in real-world applications involving multiple users with preference-based feedback.
Related papers
- Offline Learning for Combinatorial Multi-armed Bandits [56.96242764723241]
Off-CMAB is the first offline learning framework for CMAB.
Off-CMAB combines pessimistic reward estimations with solvers.
Experiments on synthetic and real-world datasets highlight the superior performance of CLCB.
arXiv Detail & Related papers (2025-01-31T16:56:18Z) - Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts [27.62165569135504]
A line of research, known as online clustering of bandits, extends contextual MAB by grouping similar users into clusters.
Existing algorithms, which rely on the upper confidence bound (UCB) strategy, struggle to gather adequate statistical information to accurately identify unknown user clusters.
We propose two novel algorithms, UniCLUB and PhaseUniCLUB, which incorporate enhanced exploration mechanisms to accelerate cluster identification.
arXiv Detail & Related papers (2025-01-01T16:38:29Z) - Conversational Dueling Bandits in Generalized Linear Models [45.99797764214125]
We introduce relative feedback-based conversations into conversational recommendation systems.
We propose a novel conversational dueling bandit algorithm called ConDuel.
We also demonstrate the potential to extend our algorithm to multinomial logit bandits with theoretical and experimental guarantees.
arXiv Detail & Related papers (2024-07-26T03:43:10Z) - Graph Neural Bandits [49.85090929163639]
We propose a framework named Graph Neural Bandits (GNB) to leverage the collaborative nature among users empowered by graph neural networks (GNNs)
To refine the recommendation strategy, we utilize separate GNN-based models on estimated user graphs for exploitation and adaptive exploration.
arXiv Detail & Related papers (2023-08-21T15:57:57Z) - Ordinal Graph Gamma Belief Network for Social Recommender Systems [54.9487910312535]
We develop a hierarchical Bayesian model termed ordinal graph factor analysis (OGFA), which jointly models user-item and user-user interactions.
OGFA not only achieves good recommendation performance, but also extracts interpretable latent factors corresponding to representative user preferences.
We extend OGFA to ordinal graph gamma belief network, which is a multi-stochastic-layer deep probabilistic model.
arXiv Detail & Related papers (2022-09-12T09:19:22Z) - BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender
System [0.0]
Multi-armed bandits (MAB) provide a principled online learning approach to attain the balance between exploration and exploitation.
collaborative filtering (CF) is arguably the earliest and most influential method in the recommender system.
BanditMF is designed to address two challenges in the multi-armed bandits algorithm and collaborative filtering.
arXiv Detail & Related papers (2021-06-21T07:35:39Z) - Dual Metric Learning for Effective and Efficient Cross-Domain
Recommendations [85.6250759280292]
Cross domain recommender systems have been increasingly valuable for helping consumers identify useful items in different applications.
Existing cross-domain models typically require large number of overlap users, which can be difficult to obtain in some applications.
We propose a novel cross-domain recommendation model based on dual learning that transfers information between two related domains in an iterative manner.
arXiv Detail & Related papers (2021-04-17T09:18:59Z) - Local Clustering in Contextual Multi-Armed Bandits [44.11480686973274]
We study identifying user clusters in contextual multi-armed bandits (MAB)
We propose a bandit algorithm, LOCB, embedded with local clustering procedure.
We evaluate the proposed algorithm from various aspects, which outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2021-02-26T21:59:29Z) - Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users'
Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies.
Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.