Causal Decision Transformer for Recommender Systems via Offline
Reinforcement Learning
- URL: http://arxiv.org/abs/2304.07920v2
- Date: Sun, 27 Aug 2023 10:09:22 GMT
- Title: Causal Decision Transformer for Recommender Systems via Offline
Reinforcement Learning
- Authors: Siyu Wang and Xiaocong Chen and Dietmar Jannach and Lina Yao
- Abstract summary: We propose a new model named the causal decision transformer for recommender systems (CDT4Rec)
CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction.
To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.
- Score: 23.638418776700522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning-based recommender systems have recently gained
popularity. However, the design of the reward function, on which the agent
relies to optimize its recommendation policy, is often not straightforward.
Exploring the causality underlying users' behavior can take the place of the
reward function in guiding the agent to capture the dynamic interests of users.
Moreover, due to the typical limitations of simulation environments (e.g., data
inefficiency), most of the work cannot be broadly applied in large-scale
situations. Although some works attempt to convert the offline dataset into a
simulator, data inefficiency makes the learning process even slower. Because of
the nature of reinforcement learning (i.e., learning by interaction), it cannot
collect enough data to train during a single interaction. Furthermore,
traditional reinforcement learning algorithms do not have a solid capability
like supervised learning methods to learn from offline datasets directly. In
this paper, we propose a new model named the causal decision transformer for
recommender systems (CDT4Rec). CDT4Rec is an offline reinforcement learning
system that can learn from a dataset rather than from online interaction.
Moreover, CDT4Rec employs the transformer architecture, which is capable of
processing large offline datasets and capturing both short-term and long-term
dependencies within the data to estimate the causal relationship between
action, state, and reward. To demonstrate the feasibility and superiority of
our model, we have conducted experiments on six real-world offline datasets and
one online simulator.
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline
Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance.
We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z) - Benchmarking Offline Reinforcement Learning on Real-Robot Hardware [35.29390454207064]
Dexterous manipulation in particular remains an open problem in its general form.
We propose a benchmark including a large collection of data for offline learning from a dexterous manipulation platform on two tasks.
We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
arXiv Detail & Related papers (2023-07-28T17:29:49Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Offline Robot Reinforcement Learning with Uncertainty-Guided Human
Expert Sampling [11.751910133386254]
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data.
We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data.
Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
arXiv Detail & Related papers (2022-12-16T01:41:59Z) - Towards Data-Driven Offline Simulations for Online Reinforcement
Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement
Learning [28.947071041811586]
offline reinforcement learning proposes to learn policies from large collected datasets without interaction.
Current algorithms overfit to the dataset they are trained on and perform poor out-of-distribution generalization to the environment when deployed.
We propose a Surprisingly Simple Self-Supervision algorithm (S4RL) which utilizes data augmentations from states to learn value functions that are better at generalizing and extrapolating when deployed in the environment.
arXiv Detail & Related papers (2021-03-10T20:13:21Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.