Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users'
Feedback
- URL: http://arxiv.org/abs/2009.07518v1
- Date: Wed, 16 Sep 2020 07:32:51 GMT
- Title: Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users'
Feedback
- Authors: Alexandre Letard, Tassadit Amghar, Olivier Camp, Nicolas Gutowski
- Abstract summary: We present a novel approach for considering user feedback and evaluate it using three distinct strategies.
Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
- Score: 62.997667081978825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed
Bandits (COM-MAB) show good results on a global accuracy metric. This can be
achieved, in the case of recommender systems, with personalization. However,
with a combinatorial online learning approach, personalization implies a large
amount of user feedbacks. Such feedbacks can be hard to acquire when users need
to be directly and frequently solicited. For a number of fields of activities
undergoing the digitization of their business, online learning is unavoidable.
Thus, a number of approaches allowing implicit user feedback retrieval have
been implemented. Nevertheless, this implicit feedback can be misleading or
inefficient for the agent's learning. Herein, we propose a novel approach
reducing the number of explicit feedbacks required by Combinatorial Multi Armed
bandit (COM-MAB) algorithms while providing similar levels of global accuracy
and learning efficiency to classical competitive methods. In this paper we
present a novel approach for considering user feedback and evaluate it using
three distinct strategies. Despite a limited number of feedbacks returned by
users (as low as 20% of the total), our approach obtains similar results to
those of state of the art approaches.
Related papers
- Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - Kernelized Offline Contextual Dueling Bandits [15.646879026749168]
In this work, we take advantage of the fact that often the agent can choose contexts at which to obtain human feedback.
We give an upper-confidence-bound style algorithm for this setting and prove a regret bound.
arXiv Detail & Related papers (2023-07-21T01:17:31Z) - Multi-View Interactive Collaborative Filtering [0.0]
We propose a novel partially online latent factor recommender algorithm that incorporates both rating and contextual information.
MV-ICTR significantly increases performance on datasets with high percentages of cold-start users and items.
arXiv Detail & Related papers (2023-05-14T20:31:37Z) - Multi-Action Dialog Policy Learning from Logged User Feedback [28.4271696269512]
Multi-action dialog policy generates multiple atomic dialog actions per turn.
Due to data limitations, existing policy models generalize poorly toward unseen dialog flows.
We propose BanditMatch to improve multi-action dialog policy learning with explicit and implicit turn-level user feedback.
arXiv Detail & Related papers (2023-02-27T04:01:28Z) - The Minority Matters: A Diversity-Promoting Collaborative Metric
Learning Algorithm [154.47590401735323]
Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems.
This paper focuses on a challenging scenario where a user has multiple categories of interests.
We propose a novel method called textitDiversity-Promoting Collaborative Metric Learning (DPCML)
arXiv Detail & Related papers (2022-09-30T08:02:18Z) - Adapting Triplet Importance of Implicit Feedback for Personalized
Recommendation [43.85549591503592]
Implicit feedback is frequently used for developing personalized recommendation services.
We propose a novel training framework named Triplet Importance Learning (TIL), which adaptively learns the importance score of training triplets.
We show that our proposed method outperforms the best existing models by 3-21% in terms of Recall@k for the top-k recommendation.
arXiv Detail & Related papers (2022-08-02T19:44:47Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Modeling Attrition in Recommender Systems with Departing Bandits [84.85560764274399]
We propose a novel multi-armed bandit setup that captures policy-dependent horizons.
We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal.
We then move forward to the more challenging case, where users are divided among two types.
arXiv Detail & Related papers (2022-03-25T02:30:54Z) - Simulating Bandit Learning from User Feedback for Extractive Question
Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data.
We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z) - BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender
System [0.0]
Multi-armed bandits (MAB) provide a principled online learning approach to attain the balance between exploration and exploitation.
collaborative filtering (CF) is arguably the earliest and most influential method in the recommender system.
BanditMF is designed to address two challenges in the multi-armed bandits algorithm and collaborative filtering.
arXiv Detail & Related papers (2021-06-21T07:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.