Related papers: Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback

Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback

URL: http://arxiv.org/abs/2009.07518v1
Date: Wed, 16 Sep 2020 07:32:51 GMT
Title: Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback
Authors: Alexandre Letard, Tassadit Amghar, Olivier Camp, Nicolas Gutowski
Abstract summary: We present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
Score: 62.997667081978825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed Bandits (COM-MAB) show good results on a global accuracy metric. This can be achieved, in the case of recommender systems, with personalization. However, with a combinatorial online learning approach, personalization implies a large amount of user feedbacks. Such feedbacks can be hard to acquire when users need to be directly and frequently solicited. For a number of fields of activities undergoing the digitization of their business, online learning is unavoidable. Thus, a number of approaches allowing implicit user feedback retrieval have been implemented. Nevertheless, this implicit feedback can be misleading or inefficient for the agent's learning. Herein, we propose a novel approach reducing the number of explicit feedbacks required by Combinatorial Multi Armed bandit (COM-MAB) algorithms while providing similar levels of global accuracy and learning efficiency to classical competitive methods. In this paper we present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.

Related papers

ProgRoCC: A Progressive Approach to Rough Crowd Counting [66.09510514180593]
We label Rough Crowd Counting that delivers better accuracy on the basis of training data that is easier to acquire. We propose an approach to the rough crowd counting problem based on CLIP, termed ProgRoCC. Specifically, we introduce a progressive estimation learning strategy that determines the object count through a coarse-to-fine approach.
arXiv Detail & Related papers (2025-04-18T01:57:42Z)
Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback. We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z)
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback. The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied. We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z)
Kernelized Offline Contextual Dueling Bandits [15.646879026749168]
In this work, we take advantage of the fact that often the agent can choose contexts at which to obtain human feedback. We give an upper-confidence-bound style algorithm for this setting and prove a regret bound.
arXiv Detail & Related papers (2023-07-21T01:17:31Z)
Multi-View Interactive Collaborative Filtering [0.0]
We propose a novel partially online latent factor recommender algorithm that incorporates both rating and contextual information. MV-ICTR significantly increases performance on datasets with high percentages of cold-start users and items.
arXiv Detail & Related papers (2023-05-14T20:31:37Z)
Multi-Action Dialog Policy Learning from Logged User Feedback [28.4271696269512]
Multi-action dialog policy generates multiple atomic dialog actions per turn. Due to data limitations, existing policy models generalize poorly toward unseen dialog flows. We propose BanditMatch to improve multi-action dialog policy learning with explicit and implicit turn-level user feedback.
arXiv Detail & Related papers (2023-02-27T04:01:28Z)
The Minority Matters: A Diversity-Promoting Collaborative Metric Learning Algorithm [154.47590401735323]
Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems. This paper focuses on a challenging scenario where a user has multiple categories of interests. We propose a novel method called textitDiversity-Promoting Collaborative Metric Learning (DPCML)
arXiv Detail & Related papers (2022-09-30T08:02:18Z)
Adapting Triplet Importance of Implicit Feedback for Personalized Recommendation [43.85549591503592]
Implicit feedback is frequently used for developing personalized recommendation services. We propose a novel training framework named Triplet Importance Learning (TIL), which adaptively learns the importance score of training triplets. We show that our proposed method outperforms the best existing models by 3-21% in terms of Recall@k for the top-k recommendation.
arXiv Detail & Related papers (2022-08-02T19:44:47Z)
Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z)
Modeling Attrition in Recommender Systems with Departing Bandits [84.85560764274399]
We propose a novel multi-armed bandit setup that captures policy-dependent horizons. We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal. We then move forward to the more challenging case, where users are divided among two types.
arXiv Detail & Related papers (2022-03-25T02:30:54Z)
Simulating Bandit Learning from User Feedback for Extractive Question Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z)
BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender System [0.0]
Multi-armed bandits (MAB) provide a principled online learning approach to attain the balance between exploration and exploitation. collaborative filtering (CF) is arguably the earliest and most influential method in the recommender system. BanditMF is designed to address two challenges in the multi-armed bandits algorithm and collaborative filtering.
arXiv Detail & Related papers (2021-06-21T07:35:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.