Towards Validating Long-Term User Feedbacks in Interactive
Recommendation Systems
- URL: http://arxiv.org/abs/2308.11137v1
- Date: Tue, 22 Aug 2023 02:34:47 GMT
- Title: Towards Validating Long-Term User Feedbacks in Interactive
Recommendation Systems
- Authors: Hojoon Lee, Dongyoon Hwang, Kyushik Min, Jaegul Choo
- Abstract summary: Interactive Recommender Systems (IRSs) have attracted a lot of attention, due to their ability to model interactive processes between users and recommender systems.
We revisit experiments on IRS with review datasets and compared RL-based models with a simple reward model that greedily recommends the item with the highest one-step reward.
- Score: 36.45966630580796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interactive Recommender Systems (IRSs) have attracted a lot of attention, due
to their ability to model interactive processes between users and recommender
systems. Numerous approaches have adopted Reinforcement Learning (RL)
algorithms, as these can directly maximize users' cumulative rewards. In IRS,
researchers commonly utilize publicly available review datasets to compare and
evaluate algorithms. However, user feedback provided in public datasets merely
includes instant responses (e.g., a rating), with no inclusion of delayed
responses (e.g., the dwell time and the lifetime value). Thus, the question
remains whether these review datasets are an appropriate choice to evaluate the
long-term effects of the IRS. In this work, we revisited experiments on IRS
with review datasets and compared RL-based models with a simple reward model
that greedily recommends the item with the highest one-step reward. Following
extensive analysis, we can reveal three main findings: First, a simple greedy
reward model consistently outperforms RL-based models in maximizing cumulative
rewards. Second, applying higher weighting to long-term rewards leads to a
degradation of recommendation performance. Third, user feedbacks have mere
long-term effects on the benchmark datasets. Based on our findings, we conclude
that a dataset has to be carefully verified and that a simple greedy baseline
should be included for a proper evaluation of RL-based IRS approaches.
Related papers
- Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [63.32585910975191]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset.
We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset.
arXiv Detail & Related papers (2024-10-10T16:01:51Z) - RewardBench: Evaluating Reward Models for Language Modeling [100.28366840977966]
We present RewardBench, a benchmark dataset and code-base for evaluation of reward models.
The dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety.
On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods.
arXiv Detail & Related papers (2024-03-20T17:49:54Z) - Model-enhanced Contrastive Reinforcement Learning for Sequential
Recommendation [28.218427886174506]
We propose a novel RL recommender named model-enhanced contrastive reinforcement learning (MCRL)
On the one hand, we learn a value function to estimate the long-term engagement of users, together with a conservative value learning mechanism to alleviate the overestimation problem.
Experiments demonstrate that the proposed method significantly outperforms existing offline RL and self-supervised RL methods.
arXiv Detail & Related papers (2023-10-25T11:43:29Z) - Model-free Reinforcement Learning with Stochastic Reward Stabilization
for Recommender Systems [20.395091290715502]
One user's feedback on the same item at different times is random.
We design two reward stabilization frameworks that replace the direct feedback with that learned by a supervised model.
arXiv Detail & Related papers (2023-08-25T08:42:45Z) - Robust Reinforcement Learning Objectives for Sequential Recommender Systems [7.44049827436013]
We develop recommender systems that incorporate direct user feedback in the form of rewards, enhancing personalization for users.
employing RL algorithms presents challenges, including off-policy training, expansive action spaces, and the scarcity of datasets with sufficient reward signals.
We introduce an enhanced methodology aimed at providing a more effective solution to these challenges.
arXiv Detail & Related papers (2023-05-30T08:09:08Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - Knowledge Graph-enhanced Sampling for Conversational Recommender System [20.985222879085832]
Conversational Recommendation System (CRS) uses the interactive form of the dialogue systems to solve the problems of traditional recommendation systems.
This work proposes a contextual information enhancement model tailored for CRS, called Knowledge Graph-enhanced Sampling (KGenSam)
Two samplers are designed to enhance knowledge by sampling fuzzy samples with high uncertainty for obtaining user preferences and reliable negative samples for updating recommender.
arXiv Detail & Related papers (2021-10-13T11:00:50Z) - Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible.
In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types.
We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.