PrefRec: Recommender Systems with Human Preferences for Reinforcing
Long-term User Engagement
- URL: http://arxiv.org/abs/2212.02779v2
- Date: Fri, 2 Jun 2023 16:19:03 GMT
- Title: PrefRec: Recommender Systems with Human Preferences for Reinforcing
Long-term User Engagement
- Authors: Wanqi Xue, Qingpeng Cai, Zhenghai Xue, Shuo Sun, Shuchang Liu, Dong
Zheng, Peng Jiang, Kun Gai, Bo An
- Abstract summary: We propose a novel paradigm, recommender systems with human preferences (or Preference-based Recommender systems)
With PrefRec, we can fully exploit the advantages of RL in optimizing long-term goals, while avoiding complex reward engineering.
- Score: 36.95056214316392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current advances in recommender systems have been remarkably successful in
optimizing immediate engagement. However, long-term user engagement, a more
desirable performance metric, remains difficult to improve. Meanwhile, recent
reinforcement learning (RL) algorithms have shown their effectiveness in a
variety of long-term goal optimization tasks. For this reason, RL is widely
considered as a promising framework for optimizing long-term user engagement in
recommendation. Though promising, the application of RL heavily relies on
well-designed rewards, but designing rewards related to long-term user
engagement is quite difficult. To mitigate the problem, we propose a novel
paradigm, recommender systems with human preferences (or Preference-based
Recommender systems), which allows RL recommender systems to learn from
preferences about users historical behaviors rather than explicitly defined
rewards. Such preferences are easily accessible through techniques such as
crowdsourcing, as they do not require any expert knowledge. With PrefRec, we
can fully exploit the advantages of RL in optimizing long-term goals, while
avoiding complex reward engineering. PrefRec uses the preferences to
automatically train a reward function in an end-to-end manner. The reward
function is then used to generate learning signals to train the recommendation
policy. Furthermore, we design an effective optimization method for PrefRec,
which uses an additional value function, expectile regression and reward model
pre-training to improve the performance. We conduct experiments on a variety of
long-term user engagement optimization tasks. The results show that PrefRec
significantly outperforms previous state-of-the-art methods in all the tasks.
Related papers
- Learned Ranking Function: From Short-term Behavior Predictions to Long-term User Satisfaction [11.109665449393738]
We present the Learned Ranking Function (LRF), a system that takes short-term user-item behavior predictions as input and outputs a slate of recommendations.
We propose to model the problem directly as a slate optimization problem with the objective of maximizing long-term user satisfaction.
arXiv Detail & Related papers (2024-08-12T22:02:39Z) - Combining Automated Optimisation of Hyperparameters and Reward Shape [7.407166175374958]
We propose a methodology for the combined optimisation of hyperparameters and the reward function.
We conducted extensive experiments using Proximal Policy optimisation and Soft Actor-Critic.
Our results show that combined optimisation significantly improves over baseline performance in half of the environments and achieves competitive performance in the others.
arXiv Detail & Related papers (2024-06-26T12:23:54Z) - Contrastive Preference Learning: Learning from Human Feedback without RL [71.77024922527642]
We introduce Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions.
CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs.
arXiv Detail & Related papers (2023-10-20T16:37:56Z) - AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term
User Engagement [25.18963930580529]
We introduce a novel paradigm called Adaptive Sequential Recommendation (AdaRec) to address this issue.
AdaRec proposes a new distance-based representation loss to extract latent information from users' interaction trajectories.
We conduct extensive empirical analyses in both simulator-based and live sequential recommendation tasks.
arXiv Detail & Related papers (2023-10-06T02:45:21Z) - Reinforcing User Retention in a Billion Scale Short Video Recommender
System [21.681785801465328]
Short video platforms have achieved rapid user growth by recommending interesting content to users.
The objective of the recommendation is to optimize user retention, thereby driving the growth of DAU (Daily Active Users)
arXiv Detail & Related papers (2023-02-03T13:25:43Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - B-Pref: Benchmarking Preference-Based Reinforcement Learning [84.41494283081326]
We introduce B-Pref, a benchmark specially designed for preference-based RL.
A key challenge with such a benchmark is providing the ability to evaluate candidate algorithms quickly.
B-Pref alleviates this by simulating teachers with a wide array of irrationalities.
arXiv Detail & Related papers (2021-11-04T17:32:06Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z) - Reward Constrained Interactive Recommendation with Natural Language
Feedback [158.8095688415973]
We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time.
Specifically, we leverage a discriminator to detect recommendations violating user historical preference.
Our proposed framework is general and is further extended to the task of constrained text generation.
arXiv Detail & Related papers (2020-05-04T16:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.