Optimizing Long-term Value for Auction-Based Recommender Systems via
On-Policy Reinforcement Learning
- URL: http://arxiv.org/abs/2305.13747v3
- Date: Sun, 30 Jul 2023 08:08:28 GMT
- Title: Optimizing Long-term Value for Auction-Based Recommender Systems via
On-Policy Reinforcement Learning
- Authors: Ruiyang Xu, Jalaj Bhandari, Dmytro Korenkevych, Fan Liu, Yuchen He,
Alex Nikulkov, Zheqing Zhu
- Abstract summary: Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics.
We employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system.
- Score: 4.980374959955476
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Auction-based recommender systems are prevalent in online advertising
platforms, but they are typically optimized to allocate recommendation slots
based on immediate expected return metrics, neglecting the downstream effects
of recommendations on user behavior. In this study, we employ reinforcement
learning to optimize for long-term return metrics in an auction-based
recommender system. Utilizing temporal difference learning, a fundamental
reinforcement learning algorithm, we implement an one-step policy improvement
approach that biases the system towards recommendations with higher long-term
user engagement metrics. This optimizes value over long horizons while
maintaining compatibility with the auction framework. Our approach is grounded
in dynamic programming ideas which show that our method provably improves upon
the existing auction-based base policy. Through an online A/B test conducted on
an auction-based recommender system which handles billions of impressions and
users daily, we empirically establish that our proposed method outperforms the
current production system in terms of long-term user engagement metrics.
Related papers
- Interactive Visualization Recommendation with Hier-SUCB [52.11209329270573]
We propose an interactive personalized visualization recommendation (PVisRec) system that learns on user feedback from previous interactions.
For more interactive and accurate recommendations, we propose Hier-SUCB, a contextual semi-bandit in the PVisRec setting.
arXiv Detail & Related papers (2025-02-05T17:14:45Z) - Preference Discerning with LLM-Enhanced Generative Retrieval [28.309905847867178]
We propose a new paradigm, which we term preference discerning.
In preference dscerning, we explicitly condition a generative sequential recommendation system on user preferences within its context.
We generate user preferences using Large Language Models (LLMs) based on user reviews and item-specific data.
arXiv Detail & Related papers (2024-12-11T18:26:55Z) - Incorporate LLMs with Influential Recommender System [34.5820082133773]
proactive recommender systems recommend a sequence of items to guide user interest in the target item.
Existing methods struggle to construct a coherent influence path that builds up with items the user is likely to enjoy.
We introduce a novel approach named LLM-based Influence Path Planning (LLM-IPP)
Our approach maintains coherence between consecutive recommendations and enhances user acceptability of the recommended items.
arXiv Detail & Related papers (2024-09-07T13:41:37Z) - Towards Off-Policy Reinforcement Learning for Ranking Policies with
Human Feedback [47.03475305565384]
We propose a new off-policy value ranking (VR) algorithm that can simultaneously maximize user long-term rewards and optimize the ranking metric offline.
We show that the EM process guides the leaned policy to enjoy the benefit of integration of the future reward and ranking metric, and learn without any online interactions.
arXiv Detail & Related papers (2024-01-17T04:19:33Z) - Fisher-Weighted Merge of Contrastive Learning Models in Sequential
Recommendation [0.0]
We are the first to apply the Fisher-Merging method to Sequential Recommendation, addressing and resolving practical challenges associated with it.
We demonstrate the effectiveness of our proposed methods, highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.
arXiv Detail & Related papers (2023-07-05T05:58:56Z) - Breaking Feedback Loops in Recommender Systems with Causal Inference [99.22185950608838]
Recent work has shown that feedback loops may compromise recommendation quality and homogenize user behavior.
We propose the Causal Adjustment for Feedback Loops (CAFL), an algorithm that provably breaks feedback loops using causal inference.
We show that CAFL improves recommendation quality when compared to prior correction methods.
arXiv Detail & Related papers (2022-07-04T17:58:39Z) - A Review on Pushing the Limits of Baseline Recommendation Systems with
the integration of Opinion Mining & Information Retrieval Techniques [0.0]
Recommendation Systems allow users to identify trending items among a community while being timely and relevant to the user's expectations.
Deep Learning methods have been brought forward to achieve better quality recommendations.
Researchers have tried to expand on the capabilities of standard recommendation systems to provide the most effective recommendations.
arXiv Detail & Related papers (2022-05-03T22:13:33Z) - D2RLIR : an improved and diversified ranking function in interactive
recommendation systems based on deep reinforcement learning [0.3058685580689604]
This paper proposes a deep reinforcement learning based recommendation system by utilizing Actor-Critic architecture.
The proposed model is able to generate a diverse while relevance recommendation list based on the user's preferences.
arXiv Detail & Related papers (2021-10-28T13:11:29Z) - PURS: Personalized Unexpected Recommender System for Improving User
Satisfaction [76.98616102965023]
We describe a novel Personalized Unexpected Recommender System (PURS) model that incorporates unexpectedness into the recommendation process.
Extensive offline experiments on three real-world datasets illustrate that the proposed PURS model significantly outperforms the state-of-the-art baseline approaches.
arXiv Detail & Related papers (2021-06-05T01:33:21Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z) - Reward Constrained Interactive Recommendation with Natural Language
Feedback [158.8095688415973]
We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time.
Specifically, we leverage a discriminator to detect recommendations violating user historical preference.
Our proposed framework is general and is further extended to the task of constrained text generation.
arXiv Detail & Related papers (2020-05-04T16:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.