Optimizing Long-term Value for Auction-Based Recommender Systems via
On-Policy Reinforcement Learning
- URL: http://arxiv.org/abs/2305.13747v3
- Date: Sun, 30 Jul 2023 08:08:28 GMT
- Title: Optimizing Long-term Value for Auction-Based Recommender Systems via
On-Policy Reinforcement Learning
- Authors: Ruiyang Xu, Jalaj Bhandari, Dmytro Korenkevych, Fan Liu, Yuchen He,
Alex Nikulkov, Zheqing Zhu
- Abstract summary: Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics.
We employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system.
- Score: 4.980374959955476
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Auction-based recommender systems are prevalent in online advertising
platforms, but they are typically optimized to allocate recommendation slots
based on immediate expected return metrics, neglecting the downstream effects
of recommendations on user behavior. In this study, we employ reinforcement
learning to optimize for long-term return metrics in an auction-based
recommender system. Utilizing temporal difference learning, a fundamental
reinforcement learning algorithm, we implement an one-step policy improvement
approach that biases the system towards recommendations with higher long-term
user engagement metrics. This optimizes value over long horizons while
maintaining compatibility with the auction framework. Our approach is grounded
in dynamic programming ideas which show that our method provably improves upon
the existing auction-based base policy. Through an online A/B test conducted on
an auction-based recommender system which handles billions of impressions and
users daily, we empirically establish that our proposed method outperforms the
current production system in terms of long-term user engagement metrics.
Related papers
- Towards Off-Policy Reinforcement Learning for Ranking Policies with
Human Feedback [47.03475305565384]
We propose a new off-policy value ranking (VR) algorithm that can simultaneously maximize user long-term rewards and optimize the ranking metric offline.
We show that the EM process guides the leaned policy to enjoy the benefit of integration of the future reward and ranking metric, and learn without any online interactions.
arXiv Detail & Related papers (2024-01-17T04:19:33Z) - Fisher-Weighted Merge of Contrastive Learning Models in Sequential
Recommendation [0.0]
We are the first to apply the Fisher-Merging method to Sequential Recommendation, addressing and resolving practical challenges associated with it.
We demonstrate the effectiveness of our proposed methods, highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.
arXiv Detail & Related papers (2023-07-05T05:58:56Z) - Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective [11.31980071390936]
We present a novel podcast recommender system deployed at industrial scale.
In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests.
arXiv Detail & Related papers (2023-02-07T16:17:25Z) - Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance.
Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets.
Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z) - Breaking Feedback Loops in Recommender Systems with Causal Inference [99.22185950608838]
Recent work has shown that feedback loops may compromise recommendation quality and homogenize user behavior.
We propose the Causal Adjustment for Feedback Loops (CAFL), an algorithm that provably breaks feedback loops using causal inference.
We show that CAFL improves recommendation quality when compared to prior correction methods.
arXiv Detail & Related papers (2022-07-04T17:58:39Z) - A Review on Pushing the Limits of Baseline Recommendation Systems with
the integration of Opinion Mining & Information Retrieval Techniques [0.0]
Recommendation Systems allow users to identify trending items among a community while being timely and relevant to the user's expectations.
Deep Learning methods have been brought forward to achieve better quality recommendations.
Researchers have tried to expand on the capabilities of standard recommendation systems to provide the most effective recommendations.
arXiv Detail & Related papers (2022-05-03T22:13:33Z) - D2RLIR : an improved and diversified ranking function in interactive
recommendation systems based on deep reinforcement learning [0.3058685580689604]
This paper proposes a deep reinforcement learning based recommendation system by utilizing Actor-Critic architecture.
The proposed model is able to generate a diverse while relevance recommendation list based on the user's preferences.
arXiv Detail & Related papers (2021-10-28T13:11:29Z) - PURS: Personalized Unexpected Recommender System for Improving User
Satisfaction [76.98616102965023]
We describe a novel Personalized Unexpected Recommender System (PURS) model that incorporates unexpectedness into the recommendation process.
Extensive offline experiments on three real-world datasets illustrate that the proposed PURS model significantly outperforms the state-of-the-art baseline approaches.
arXiv Detail & Related papers (2021-06-05T01:33:21Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z) - Reward Constrained Interactive Recommendation with Natural Language
Feedback [158.8095688415973]
We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time.
Specifically, we leverage a discriminator to detect recommendations violating user historical preference.
Our proposed framework is general and is further extended to the task of constrained text generation.
arXiv Detail & Related papers (2020-05-04T16:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.