Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users'
Feedback
- URL: http://arxiv.org/abs/2009.07518v1
- Date: Wed, 16 Sep 2020 07:32:51 GMT
- Title: Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users'
Feedback
- Authors: Alexandre Letard, Tassadit Amghar, Olivier Camp, Nicolas Gutowski
- Abstract summary: We present a novel approach for considering user feedback and evaluate it using three distinct strategies.
Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
- Score: 62.997667081978825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed
Bandits (COM-MAB) show good results on a global accuracy metric. This can be
achieved, in the case of recommender systems, with personalization. However,
with a combinatorial online learning approach, personalization implies a large
amount of user feedbacks. Such feedbacks can be hard to acquire when users need
to be directly and frequently solicited. For a number of fields of activities
undergoing the digitization of their business, online learning is unavoidable.
Thus, a number of approaches allowing implicit user feedback retrieval have
been implemented. Nevertheless, this implicit feedback can be misleading or
inefficient for the agent's learning. Herein, we propose a novel approach
reducing the number of explicit feedbacks required by Combinatorial Multi Armed
bandit (COM-MAB) algorithms while providing similar levels of global accuracy
and learning efficiency to classical competitive methods. In this paper we
present a novel approach for considering user feedback and evaluate it using
three distinct strategies. Despite a limited number of feedbacks returned by
users (as low as 20% of the total), our approach obtains similar results to
those of state of the art approaches.
Related papers
- Interactive Visualization Recommendation with Hier-SUCB [52.11209329270573]
We propose an interactive personalized visualization recommendation (PVisRec) system that learns on user feedback from previous interactions.
For more interactive and accurate recommendations, we propose Hier-SUCB, a contextual semi-bandit in the PVisRec setting.
arXiv Detail & Related papers (2025-02-05T17:14:45Z) - Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback.
We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z) - Kernelized Offline Contextual Dueling Bandits [15.646879026749168]
In this work, we take advantage of the fact that often the agent can choose contexts at which to obtain human feedback.
We give an upper-confidence-bound style algorithm for this setting and prove a regret bound.
arXiv Detail & Related papers (2023-07-21T01:17:31Z) - Multi-View Interactive Collaborative Filtering [0.0]
We propose a novel partially online latent factor recommender algorithm that incorporates both rating and contextual information.
MV-ICTR significantly increases performance on datasets with high percentages of cold-start users and items.
arXiv Detail & Related papers (2023-05-14T20:31:37Z) - Multi-Action Dialog Policy Learning from Logged User Feedback [28.4271696269512]
Multi-action dialog policy generates multiple atomic dialog actions per turn.
Due to data limitations, existing policy models generalize poorly toward unseen dialog flows.
We propose BanditMatch to improve multi-action dialog policy learning with explicit and implicit turn-level user feedback.
arXiv Detail & Related papers (2023-02-27T04:01:28Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Modeling Attrition in Recommender Systems with Departing Bandits [84.85560764274399]
We propose a novel multi-armed bandit setup that captures policy-dependent horizons.
We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal.
We then move forward to the more challenging case, where users are divided among two types.
arXiv Detail & Related papers (2022-03-25T02:30:54Z) - Simulating Bandit Learning from User Feedback for Extractive Question
Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data.
We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z) - BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender
System [0.0]
Multi-armed bandits (MAB) provide a principled online learning approach to attain the balance between exploration and exploitation.
collaborative filtering (CF) is arguably the earliest and most influential method in the recommender system.
BanditMF is designed to address two challenges in the multi-armed bandits algorithm and collaborative filtering.
arXiv Detail & Related papers (2021-06-21T07:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.