Related papers: Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

URL: http://arxiv.org/abs/2305.13747v3
Date: Sun, 30 Jul 2023 08:08:28 GMT
Title: Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning
Authors: Ruiyang Xu, Jalaj Bhandari, Dmytro Korenkevych, Fan Liu, Yuchen He, Alex Nikulkov, Zheqing Zhu
Abstract summary: Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics. We employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system.
Score: 4.980374959955476
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing temporal difference learning, a fundamental reinforcement learning algorithm, we implement an one-step policy improvement approach that biases the system towards recommendations with higher long-term user engagement metrics. This optimizes value over long horizons while maintaining compatibility with the auction framework. Our approach is grounded in dynamic programming ideas which show that our method provably improves upon the existing auction-based base policy. Through an online A/B test conducted on an auction-based recommender system which handles billions of impressions and users daily, we empirically establish that our proposed method outperforms the current production system in terms of long-term user engagement metrics.

Related papers

Interactive Visualization Recommendation with Hier-SUCB [52.11209329270573]
We propose an interactive personalized visualization recommendation (PVisRec) system that learns on user feedback from previous interactions. For more interactive and accurate recommendations, we propose Hier-SUCB, a contextual semi-bandit in the PVisRec setting.
arXiv Detail & Related papers (2025-02-05T17:14:45Z)
Preference Discerning with LLM-Enhanced Generative Retrieval [28.309905847867178]
We propose a new paradigm, which we term preference discerning. In preference dscerning, we explicitly condition a generative sequential recommendation system on user preferences within its context. We generate user preferences using Large Language Models (LLMs) based on user reviews and item-specific data.
arXiv Detail & Related papers (2024-12-11T18:26:55Z)
Incorporate LLMs with Influential Recommender System [34.5820082133773]
proactive recommender systems recommend a sequence of items to guide user interest in the target item. Existing methods struggle to construct a coherent influence path that builds up with items the user is likely to enjoy. We introduce a novel approach named LLM-based Influence Path Planning (LLM-IPP) Our approach maintains coherence between consecutive recommendations and enhances user acceptability of the recommended items.
arXiv Detail & Related papers (2024-09-07T13:41:37Z)
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback [47.03475305565384]
We propose a new off-policy value ranking (VR) algorithm that can simultaneously maximize user long-term rewards and optimize the ranking metric offline. We show that the EM process guides the leaned policy to enjoy the benefit of integration of the future reward and ranking metric, and learn without any online interactions.
arXiv Detail & Related papers (2024-01-17T04:19:33Z)
Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation [0.0]
We are the first to apply the Fisher-Merging method to Sequential Recommendation, addressing and resolving practical challenges associated with it. We demonstrate the effectiveness of our proposed methods, highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.
arXiv Detail & Related papers (2023-07-05T05:58:56Z)
Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets. Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z)
Breaking Feedback Loops in Recommender Systems with Causal Inference [99.22185950608838]
Recent work has shown that feedback loops may compromise recommendation quality and homogenize user behavior. We propose the Causal Adjustment for Feedback Loops (CAFL), an algorithm that provably breaks feedback loops using causal inference. We show that CAFL improves recommendation quality when compared to prior correction methods.
arXiv Detail & Related papers (2022-07-04T17:58:39Z)
A Review on Pushing the Limits of Baseline Recommendation Systems with the integration of Opinion Mining & Information Retrieval Techniques [0.0]
Recommendation Systems allow users to identify trending items among a community while being timely and relevant to the user's expectations. Deep Learning methods have been brought forward to achieve better quality recommendations. Researchers have tried to expand on the capabilities of standard recommendation systems to provide the most effective recommendations.
arXiv Detail & Related papers (2022-05-03T22:13:33Z)
D2RLIR : an improved and diversified ranking function in interactive recommendation systems based on deep reinforcement learning [0.3058685580689604]
This paper proposes a deep reinforcement learning based recommendation system by utilizing Actor-Critic architecture. The proposed model is able to generate a diverse while relevance recommendation list based on the user's preferences.
arXiv Detail & Related papers (2021-10-28T13:11:29Z)
PURS: Personalized Unexpected Recommender System for Improving User Satisfaction [76.98616102965023]
We describe a novel Personalized Unexpected Recommender System (PURS) model that incorporates unexpectedness into the recommendation process. Extensive offline experiments on three real-world datasets illustrate that the proposed PURS model significantly outperforms the state-of-the-art baseline approaches.
arXiv Detail & Related papers (2021-06-05T01:33:21Z)
Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility. We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z)
Reward Constrained Interactive Recommendation with Natural Language Feedback [158.8095688415973]
We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time. Specifically, we leverage a discriminator to detect recommendations violating user historical preference. Our proposed framework is general and is further extended to the task of constrained text generation.
arXiv Detail & Related papers (2020-05-04T16:23:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.