DARLR: Dual-Agent Offline Reinforcement Learning for Recommender Systems with Dynamic Reward
- URL: http://arxiv.org/abs/2505.07257v1
- Date: Mon, 12 May 2025 06:18:31 GMT
- Title: DARLR: Dual-Agent Offline Reinforcement Learning for Recommender Systems with Dynamic Reward
- Authors: Yi Zhang, Ruihong Qiu, Xuwei Xu, Jiajun Liu, Sen Wang,
- Abstract summary: Model-based offline reinforcement learning has emerged as a promising approach for recommender systems.<n>DarLR is proposed to dynamically update world models to enhance recommendation policies.<n>Experiments on four benchmark datasets demonstrate the superior performance of DARLR.
- Score: 14.323631574821123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based offline reinforcement learning (RL) has emerged as a promising approach for recommender systems, enabling effective policy learning by interacting with frozen world models. However, the reward functions in these world models, trained on sparse offline logs, often suffer from inaccuracies. Specifically, existing methods face two major limitations in addressing this challenge: (1) deterministic use of reward functions as static look-up tables, which propagates inaccuracies during policy learning, and (2) static uncertainty designs that fail to effectively capture decision risks and mitigate the impact of these inaccuracies. In this work, a dual-agent framework, DARLR, is proposed to dynamically update world models to enhance recommendation policies. To achieve this, a \textbf{\textit{selector}} is introduced to identify reference users by balancing similarity and diversity so that the \textbf{\textit{recommender}} can aggregate information from these users and iteratively refine reward estimations for dynamic reward shaping. Further, the statistical features of the selected users guide the dynamic adaptation of an uncertainty penalty to better align with evolving recommendation requirements. Extensive experiments on four benchmark datasets demonstrate the superior performance of DARLR, validating its effectiveness. The code is available at https://github.com/ArronDZhang/DARLR.
Related papers
- Reward Balancing Revisited: Enhancing Offline Reinforcement Learning for Recommender Systems [10.995830376373801]
We present an innovative offline RL framework termed Reallocated Reward for Recommender Systems (R3S)<n>By integrating inherent model uncertainty to tackle the intrinsic fluctuations in reward predictions, we boost diversity for decision-making to align with a more interactive paradigm.<n>The experimental results demonstrate that R3S improves the accuracy of world models and efficiently harmonizes the heterogeneous preferences of the users.
arXiv Detail & Related papers (2025-06-27T10:46:41Z) - LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>Our framework exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z) - A Novel Generative Model with Causality Constraint for Mitigating Biases in Recommender Systems [20.672668625179526]
Latent confounding bias can obscure the true causal relationship between user feedback and item exposure.<n>We propose a novel generative framework called Latent Causality Constraints for Debiasing representation learning in Recommender Systems.
arXiv Detail & Related papers (2025-05-22T14:09:39Z) - ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems [14.74207332728742]
offline reinforcement learning (RL) is an effective tool for real-world recommender systems.<n>This paper proposes a novel model-based Reward Shaping in Offline Reinforcement Learning for Recommender Systems, ROLeR, for reward and uncertainty estimation.
arXiv Detail & Related papers (2024-07-18T05:07:11Z) - Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients [20.941908494137806]
This paper proposes a novel deep symbolic regression approach to enhance the robustness and interpretability of data-driven mathematical expression discovery.
Despite the success of the state-of-the-art method, DSR, it is built on recurrent neural networks, purely guided by data fitness.
We use transformers in conjunction with breadth-first-search to improve the learning performance.
arXiv Detail & Related papers (2024-06-10T19:29:10Z) - Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from
Imperfect Demonstration for Interactive Recommendation [23.048841953423846]
We focus on the problem of learning to reward, which is fundamental to reinforcement learning.
Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization.
We propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties.
arXiv Detail & Related papers (2023-10-30T13:43:20Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Contrastive Self-supervised Sequential Recommendation with Robust
Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data.
Old and new issues remain, including data-sparsity and noisy data.
We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.