Related papers: Reinforce Lifelong Interaction Value of User-Author Pairs for Large-Scale Recommendation Systems

Reinforce Lifelong Interaction Value of User-Author Pairs for Large-Scale Recommendation Systems

URL: http://arxiv.org/abs/2507.16253v1
Date: Tue, 22 Jul 2025 05:58:55 GMT
Title: Reinforce Lifelong Interaction Value of User-Author Pairs for Large-Scale Recommendation Systems
Authors: Yisha Li, Lexi Gao, Jingxin Liu, Xiang Gao, Xin Li, Haiyang Lu, Liyin Hong,
Abstract summary: We introduce RL to Reinforce Lifelong Interaction Value of User-Author pairs (RLIV-UA) based on each interaction of UA pairs.<n>In offline experiments and online A/B tests, the RLIV-UA model achieves both higher user satisfaction and higher platform profits than compared methods.
Score: 11.3015594568951
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recommendation systems (RS) help users find interested content and connect authors with their target audience. Most research in RS tends to focus either on predicting users' immediate feedback (like click-through rate) accurately or improving users' long-term engagement. However, they ignore the influence for authors and the lifelong interaction value (LIV) of user-author pairs, which is particularly crucial for improving the prosperity of social community in short-video platforms. Currently, reinforcement learning (RL) can optimize long-term benefits and has been widely applied in RS. In this paper, we introduce RL to Reinforce Lifelong Interaction Value of User-Author pairs (RLIV-UA) based on each interaction of UA pairs. To address the long intervals between UA interactions and the large scale of the UA space, we propose a novel Sparse Cross-Request Interaction Markov Decision Process (SCRI-MDP) and introduce an Adjacent State Approximation (ASA) method to construct RL training samples. Additionally, we introduce Multi-Task Critic Learning (MTCL) to capture the progressive nature of UA interactions (click -> follow -> gift), where denser interaction signals are leveraged to compensate for the learning of sparse labels. Finally, an auxiliary supervised learning task is designed to enhance the convergence of the RLIV-UA model. In offline experiments and online A/B tests, the RLIV-UA model achieves both higher user satisfaction and higher platform profits than compared methods.

Related papers

Reinforcement Learning from User Feedback [28.335218244885706]
We introduce Reinforcement Learning from User Feedback (RLUF), a framework for aligning large language models with user preferences.<n>We train a reward model, P[Love], to predict the likelihood that an LLM response will receive a Love Reaction.<n>We show that P[Love] is predictive of increased positive feedback and serves as a reliable offline evaluator of future user behavior.
arXiv Detail & Related papers (2025-05-20T22:14:44Z)
Multi-agents based User Values Mining for Recommendation [52.26100802380767]
We propose a zero-shot multi-LLM collaborative framework for effective and accurate user value extraction.<n>We apply text summarization techniques to condense item content while preserving essential meaning.<n>To mitigate hallucinations, we introduce two specialized agent roles: evaluators and supervisors.
arXiv Detail & Related papers (2025-05-02T04:01:31Z)
Mind the Gap! Static and Interactive Evaluations of Large Audio Models [55.87220295533817]
Large Audio Models (LAMs) are designed to power voice-native experiences.<n>This study introduces an interactive approach to evaluate LAMs and collect 7,500 LAM interactions from 484 participants.
arXiv Detail & Related papers (2025-02-21T20:29:02Z)
Improved Diversity-Promoting Collaborative Metric Learning for Recommendation [127.08043409083687]
Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems. This paper focuses on a challenging scenario where a user has multiple categories of interests. We propose a novel method called textitDiversity-Promoting Collaborative Metric Learning (DPCML)
arXiv Detail & Related papers (2024-09-02T07:44:48Z)
Retrieval Augmentation via User Interest Clustering [57.63883506013693]
Industrial recommender systems are sensitive to the patterns of user-item engagement. We propose a novel approach that efficiently constructs user interest and facilitates low computational cost inference. Our approach has been deployed in multiple products at Meta, facilitating short-form video related recommendation.
arXiv Detail & Related papers (2024-08-07T16:35:10Z)
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts. RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z)
Personalizing Intervened Network for Long-tailed Sequential User Behavior Modeling [66.02953670238647]
Tail users suffer from significantly lower-quality recommendation than the head users after joint training. A model trained on tail users separately still achieve inferior results due to limited data. We propose a novel approach that significantly improves the recommendation performance of the tail users.
arXiv Detail & Related papers (2022-08-19T02:50:19Z)
Meta-Learning for Online Update of Recommender Systems [29.69934307878855]
MeLON is a novel online recommender update strategy that supports two-directional flexibility. MeLON learns how a recommender learns to generate the optimal learning rates for future updates.
arXiv Detail & Related papers (2022-03-19T16:27:30Z)
Sequential Search with Off-Policy Reinforcement Learning [48.88165680363482]
We propose a highly scalable hybrid learning model that consists of an RNN learning framework and an attention model. As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly. We also explore the use of off-policy reinforcement learning in multi-session personalized search ranking.
arXiv Detail & Related papers (2022-02-01T06:52:40Z)
Dynamic Embeddings for Interaction Prediction [2.5758502140236024]
In recommender systems (RSs), predicting the next item that a user interacts with is critical for user retention. Recent studies have shown the effectiveness of modeling the mutual interactions between users and items using separate user and item embeddings. We propose a novel method called DeePRed that addresses some of their limitations.
arXiv Detail & Related papers (2020-11-10T16:04:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.