Related papers: Conversational Dueling Bandits in Generalized Linear Models

Conversational Dueling Bandits in Generalized Linear Models

URL: http://arxiv.org/abs/2407.18488v1
Date: Fri, 26 Jul 2024 03:43:10 GMT
Title: Conversational Dueling Bandits in Generalized Linear Models
Authors: Shuhua Yang, Hui Yuan, Xiaoying Zhang, Mengdi Wang, Hong Zhang, Huazheng Wang,
Abstract summary: We introduce relative feedback-based conversations into conversational recommendation systems. We propose a novel conversational dueling bandit algorithm called ConDuel. We also demonstrate the potential to extend our algorithm to multinomial logit bandits with theoretical and experimental guarantees.
Score: 45.99797764214125
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Conversational recommendation systems elicit user preferences by interacting with users to obtain their feedback on recommended commodities. Such systems utilize a multi-armed bandit framework to learn user preferences in an online manner and have received great success in recent years. However, existing conversational bandit methods have several limitations. First, they only enable users to provide explicit binary feedback on the recommended items or categories, leading to ambiguity in interpretation. In practice, users are usually faced with more than one choice. Relative feedback, known for its informativeness, has gained increasing popularity in recommendation system design. Moreover, current contextual bandit methods mainly work under linear reward assumptions, ignoring practical non-linear reward structures in generalized linear models. Therefore, in this paper, we introduce relative feedback-based conversations into conversational recommendation systems through the integration of dueling bandits in generalized linear models (GLM) and propose a novel conversational dueling bandit algorithm called ConDuel. Theoretical analyses of regret upper bounds and empirical validations on synthetic and real-world data underscore ConDuel's efficacy. We also demonstrate the potential to extend our algorithm to multinomial logit bandits with theoretical and experimental guarantees, which further proves the applicability of the proposed framework.

Related papers

Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback. We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Neural Contextual Bandits for Personalized Recommendation [49.85090929163639]
This tutorial investigates the contextual bandits as a powerful framework for personalized recommendations. We focus on the exploration perspective of contextual bandits to alleviate the Matthew Effect'' in recommender systems. In addition to the conventional linear contextual bandits, we will also dedicated to neural contextual bandits.
arXiv Detail & Related papers (2023-12-21T17:03:26Z)
Hierarchical Conversational Preference Elicitation with Bandit Feedback [36.507341041113825]
We formulate a new conversational bandit problem that allows the recommender system to choose either a key-term or an item to recommend at each round. We conduct a survey and analyze a real-world dataset to find that, unlike assumptions made in prior works, key-term rewards are mainly affected by rewards of representative items. We propose two bandit algorithms, Hier-UCB and Hier-LinUCB, that leverage this observed relationship and the hierarchical structure between key-terms and items.
arXiv Detail & Related papers (2022-09-06T05:35:24Z)
Comparison-based Conversational Recommender System with Relative Bandit Feedback [15.680698037463488]
We propose a novel comparison-based conversational recommender system. We propose a new bandit algorithm, which we call RelativeConUCB. The experiments on both synthetic and real-world datasets validate the advantage of our proposed method.
arXiv Detail & Related papers (2022-08-21T08:05:46Z)
BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender System [0.0]
Multi-armed bandits (MAB) provide a principled online learning approach to attain the balance between exploration and exploitation. collaborative filtering (CF) is arguably the earliest and most influential method in the recommender system. BanditMF is designed to address two challenges in the multi-armed bandits algorithm and collaborative filtering.
arXiv Detail & Related papers (2021-06-21T07:35:39Z)
Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased. We propose a novel approach for dueling bandits based on information-directed sampling (IDS) Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z)
Reward Constrained Interactive Recommendation with Natural Language Feedback [158.8095688415973]
We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time. Specifically, we leverage a discriminator to detect recommendations violating user historical preference. Our proposed framework is general and is further extended to the task of constrained text generation.
arXiv Detail & Related papers (2020-05-04T16:23:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.