Policy-Guided Causal State Representation for Offline Reinforcement Learning Recommendation
- URL: http://arxiv.org/abs/2502.02327v1
- Date: Tue, 04 Feb 2025 13:58:20 GMT
- Title: Policy-Guided Causal State Representation for Offline Reinforcement Learning Recommendation
- Authors: Siyu Wang, Xiaocong Chen, Lina Yao,
- Abstract summary: Policy-Guided Causal Representation (PGCR) is a novel two-stage framework for causal feature selection and state representation learning in offline RLRS.
We show that PGCR significantly improves recommendation performance, confirming its effectiveness for offline RL-based recommender systems.
- Score: 17.750449033873036
- License:
- Abstract: In offline reinforcement learning-based recommender systems (RLRS), learning effective state representations is crucial for capturing user preferences that directly impact long-term rewards. However, raw state representations often contain high-dimensional, noisy information and components that are not causally relevant to the reward. Additionally, missing transitions in offline data make it challenging to accurately identify features that are most relevant to user satisfaction. To address these challenges, we propose Policy-Guided Causal Representation (PGCR), a novel two-stage framework for causal feature selection and state representation learning in offline RLRS. In the first stage, we learn a causal feature selection policy that generates modified states by isolating and retaining only the causally relevant components (CRCs) while altering irrelevant components. This policy is guided by a reward function based on the Wasserstein distance, which measures the causal effect of state components on the reward and encourages the preservation of CRCs that directly influence user interests. In the second stage, we train an encoder to learn compact state representations by minimizing the mean squared error (MSE) loss between the latent representations of the original and modified states, ensuring that the representations focus on CRCs. We provide a theoretical analysis proving the identifiability of causal effects from interventions, validating the ability of PGCR to isolate critical state components for decision-making. Extensive experiments demonstrate that PGCR significantly improves recommendation performance, confirming its effectiveness for offline RL-based recommender systems.
Related papers
- Causal Information Prioritization for Efficient Reinforcement Learning [21.74375718642216]
Current Reinforcement Learning (RL) methods often suffer from sample-inefficiency.
Recent causal approaches aim to address this problem, but they lack grounded modeling of reward-guided causal understanding of states and actions.
We propose a novel method named Causal Information Prioritization (CIP) that improves sample efficiency by leveraging factored MDPs.
arXiv Detail & Related papers (2025-02-14T11:44:17Z) - On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems [17.750449033873036]
In Reinforcement Learning-based Recommender Systems (RLRS), the complexity and dynamism of user interactions often result in high-dimensional and noisy state spaces.
We introduce an innovative causal approach for decomposing the state and extracting textbfCausal-textbfIntextbfDispensable textbfState Representations.
arXiv Detail & Related papers (2024-07-18T01:41:05Z) - A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables.
Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols.
This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z) - Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - Robustness Verification of Deep Reinforcement Learning Based Control
Systems using Reward Martingales [13.069196356472272]
We present the first approach for robustness verification of DRL-based control systems by introducing reward martingales.
Our results provide provably quantitative certificates for the two questions.
We then show that reward martingales can be implemented and trained via neural networks, against different types of control policies.
arXiv Detail & Related papers (2023-12-15T11:16:47Z) - Accountability in Offline Reinforcement Learning: Explaining Decisions
with a Corpus of Examples [70.84093873437425]
This paper introduces the Accountable Offline Controller (AOC) that employs the offline dataset as the Decision Corpus.
AOC operates effectively in low-data scenarios, can be extended to the strictly offline imitation setting, and displays qualities of both conservation and adaptability.
We assess AOC's performance in both simulated and real-world healthcare scenarios, emphasizing its capability to manage offline control tasks with high levels of performance while maintaining accountability.
arXiv Detail & Related papers (2023-10-11T17:20:32Z) - DELTA: Dynamic Embedding Learning with Truncated Conscious Attention for
CTR Prediction [61.68415731896613]
Click-Through Rate (CTR) prediction is a pivotal task in product and content recommendation.
We propose a model that enables Dynamic Embedding Learning with Truncated Conscious Attention for CTR prediction.
arXiv Detail & Related papers (2023-05-03T12:34:45Z) - Rewards Encoding Environment Dynamics Improves Preference-based
Reinforcement Learning [4.969254618158096]
We show that encoding environment dynamics in the reward function (REED) dramatically reduces the number of preference labels required in state-of-the-art preference-based RL frameworks.
For some domains, REED-based reward functions result in policies that outperform policies trained on the ground truth reward.
arXiv Detail & Related papers (2022-11-12T00:34:41Z) - Age of Semantics in Cooperative Communications: To Expedite Simulation
Towards Real via Offline Reinforcement Learning [53.18060442931179]
We propose the age of semantics (AoS) for measuring semantics freshness of status updates in a cooperative relay communication system.
We derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework.
We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset.
arXiv Detail & Related papers (2022-09-19T11:55:28Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - RORL: Robust Offline Reinforcement Learning via Conservative Smoothing [72.8062448549897]
offline reinforcement learning can exploit the massive amount of offline data for complex decision-making tasks.
Current offline RL algorithms are generally designed to be conservative for value estimation and action selection.
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
arXiv Detail & Related papers (2022-06-06T18:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.