Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic Recommendation
- URL: http://arxiv.org/abs/2406.00725v1
- Date: Sun, 02 Jun 2024 12:21:10 GMT
- Title: Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic Recommendation
- Authors: Xiaocong Chen, Siyu Wang, Lina Yao,
- Abstract summary: We introduce a novel methodology named Max-Entropy enhanced Decision Transformer with Reward Relabeling for Offline RLRS (EDT4Rec)
Our approach begins with a max entropy perspective, leading to the development of a max entropy enhanced exploration strategy.
To augment the model's capability to stitch sub-optimal trajectories, we incorporate a unique reward relabeling technique.
- Score: 17.750449033873036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning-based recommender systems have recently gained popularity. However, due to the typical limitations of simulation environments (e.g., data inefficiency), most of the work cannot be broadly applied in all domains. To counter these challenges, recent advancements have leveraged offline reinforcement learning methods, notable for their data-driven approach utilizing offline datasets. A prominent example of this is the Decision Transformer. Despite its popularity, the Decision Transformer approach has inherent drawbacks, particularly evident in recommendation methods based on it. This paper identifies two key shortcomings in existing Decision Transformer-based methods: a lack of stitching capability and limited effectiveness in online adoption. In response, we introduce a novel methodology named Max-Entropy enhanced Decision Transformer with Reward Relabeling for Offline RLRS (EDT4Rec). Our approach begins with a max entropy perspective, leading to the development of a max entropy enhanced exploration strategy. This strategy is designed to facilitate more effective exploration in online environments. Additionally, to augment the model's capability to stitch sub-optimal trajectories, we incorporate a unique reward relabeling technique. To validate the effectiveness and superiority of EDT4Rec, we have conducted comprehensive experiments across six real-world offline datasets and in an online simulator.
Related papers
- Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers [111.78179839856293]
Decision Transformers have emerged as a compelling paradigm for offline Reinforcement Learning (RL)
Online finetuning of decision transformers has been surprisingly under-explored.
We find that simply adding TD3 gradients to the finetuning process of ODT effectively improves the online finetuning performance of ODT.
arXiv Detail & Related papers (2024-10-31T16:38:51Z) - Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Sample-efficient Imitative Multi-token Decision Transformer for Real-world Driving [18.34685506480288]
We propose Sample-efficient Imitative Multi-token Decision Transformer (SimDT)
SimDT introduces multi-token prediction, online imitative learning pipeline and prioritized experience replay to sequence-modelling reinforcement learning.
Results exceed popular imitation and reinforcement learning algorithms both in open-loop and closed-loop settings on Waymax benchmark.
arXiv Detail & Related papers (2024-06-18T14:27:14Z) - Offline Trajectory Optimization for Offline Reinforcement Learning [42.306438854850434]
offline reinforcement learning aims to learn policies without online explorations.<n>Existing data augmentation methods for offline RL suffer from (i) trivial improvement from short-horizon simulation.<n>We propose offline trajectory optimization for offline reinforcement learning (OTTO)
arXiv Detail & Related papers (2024-04-16T08:48:46Z) - Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems [17.750449033873036]
Reinforcement Learning-based Recommender Systems (RLRS) have shown promise across a spectrum of applications.
Yet, they grapple with challenges, notably in crafting reward functions and harnessing large pre-existing datasets.
Recent advancements in offline RLRS provide a solution for how to address these two challenges.
arXiv Detail & Related papers (2024-03-26T12:08:58Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Rethinking Decision Transformer via Hierarchical Reinforcement Learning [54.3596066989024]
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL)
We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL.
We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
arXiv Detail & Related papers (2023-11-01T03:32:13Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Causal Decision Transformer for Recommender Systems via Offline
Reinforcement Learning [23.638418776700522]
We propose a new model named the causal decision transformer for recommender systems (CDT4Rec)
CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction.
To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.
arXiv Detail & Related papers (2023-04-17T00:05:52Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.