Related papers: Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

URL: http://arxiv.org/abs/2410.24108v1
Date: Thu, 31 Oct 2024 16:38:51 GMT
Title: Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers
Authors: Kai Yan, Alexander G. Schwing, Yu-Xiong Wang,
Abstract summary: Decision Transformers have emerged as a compelling paradigm for offline Reinforcement Learning (RL) Online finetuning of decision transformers has been surprisingly under-explored. We find that simply adding TD3 gradients to the finetuning process of ODT effectively improves the online finetuning performance of ODT.
Score: 111.78179839856293
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Decision Transformers have recently emerged as a new and compelling paradigm for offline Reinforcement Learning (RL), completing a trajectory in an autoregressive way. While improvements have been made to overcome initial shortcomings, online finetuning of decision transformers has been surprisingly under-explored. The widely adopted state-of-the-art Online Decision Transformer (ODT) still struggles when pretrained with low-reward offline data. In this paper, we theoretically analyze the online-finetuning of the decision transformer, showing that the commonly used Return-To-Go (RTG) that's far from the expected return hampers the online fine-tuning process. This problem, however, is well-addressed by the value function and advantage of standard RL algorithms. As suggested by our analysis, in our experiments, we hence find that simply adding TD3 gradients to the finetuning process of ODT effectively improves the online finetuning performance of ODT, especially if ODT is pretrained with low-reward offline data. These findings provide new directions to further improve decision transformers.

Related papers

Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic Recommendation [17.750449033873036]
We introduce a novel methodology named Max-Entropy enhanced Decision Transformer with Reward Relabeling for Offline RLRS (EDT4Rec) Our approach begins with a max entropy perspective, leading to the development of a max entropy enhanced exploration strategy. To augment the model's capability to stitch sub-optimal trajectories, we incorporate a unique reward relabeling technique.
arXiv Detail & Related papers (2024-06-02T12:21:10Z)
Offline Trajectory Generalization for Offline Reinforcement Learning [43.89740983387144]
offline reinforcement learning (RL) aims to learn policies from static datasets of previously collected trajectories. We propose offline trajectory generalization through world transformers for offline reinforcement learning (OTTO) OTTO serves as a plug-in module and can be integrated with existing offline RL methods to enhance them with better generalization capability of transformers and high-rewarded data augmentation.
arXiv Detail & Related papers (2024-04-16T08:48:46Z)
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining [25.669038513039357]
This paper provides a theoretical framework that analyzes supervised pretraining for in-context reinforcement learning. We show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms.
arXiv Detail & Related papers (2023-10-12T17:55:02Z)
Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems. We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
A Survey on Transformers in Reinforcement Learning [66.23773284875843]
Transformer has been considered the dominating neural architecture in NLP and CV, mostly under supervised settings. Recently, a similar surge of using Transformers has appeared in the domain of reinforcement learning (RL), but it is faced with unique design choices and challenges brought by the nature of RL. This paper systematically reviews motivations and progress on using Transformers in RL, provide a taxonomy on existing works, discuss each sub-field, and summarize future prospects.
arXiv Detail & Related papers (2023-01-08T14:04:26Z)
On Transforming Reinforcement Learning by Transformer: The Development Trajectory [97.79247023389445]
Transformer, originally devised for natural language processing, has also attested significant success in computer vision. We group existing developments in two categories: architecture enhancement and trajectory optimization. We examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving.
arXiv Detail & Related papers (2022-12-29T03:15:59Z)
Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z)
Online Decision Transformer [30.54774566089644]
offline reinforcement learning (RL) can be formulated as a sequence modeling problem. Online Decision Transformers (ODT) is an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning.
arXiv Detail & Related papers (2022-02-11T13:43:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.