Return-Aligned Decision Transformer
- URL: http://arxiv.org/abs/2402.03923v5
- Date: Thu, 23 Jan 2025 09:08:36 GMT
- Title: Return-Aligned Decision Transformer
- Authors: Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-Serra,
- Abstract summary: Decision Transformer (DT) optimize a policy that generates actions conditioned on the target return through supervised learning.
We propose Return-Aligned Decision Transformer (RADT) to more effectively align the actual return with the target return.
- Score: 13.973995766656332
- License:
- Abstract: Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. It is increasingly important to adjust the performance of AI agents to meet human requirements, for example, in applications like video games and education tools. Decision Transformer (DT) optimizes a policy that generates actions conditioned on the target return through supervised learning and includes a mechanism to control the agent's performance using the target return. However, the action generation is hardly influenced by the target return because DT's self-attention allocates scarce attention scores to the return tokens. In this paper, we propose Return-Aligned Decision Transformer (RADT), designed to more effectively align the actual return with the target return. RADT leverages features extracted by paying attention solely to the return, enabling action generation to consistently depend on the target return. Extensive experiments show that RADT significantly reduces the discrepancies between the actual return and the target return compared to DT-based methods.
Related papers
- In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning [15.369324784520538]
We propose In-Dataset Trajectory Return Regularization (DTR) for offline preference-based reinforcement learning.
DTR mitigates the risk of learning inaccurate trajectory stitching under reward bias.
We also introduce an ensemble normalization technique that effectively integrates multiple reward models.
arXiv Detail & Related papers (2024-12-12T09:35:47Z) - Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning [26.915055027485465]
We study offline off-dynamics reinforcement learning (RL) to enhance policy learning in a target domain with limited data.
Our approach centers on return-conditioned supervised learning (RCSL), particularly focusing on the decision transformer (DT)
We propose the Return Augmented Decision Transformer (RADT) method, where we augment the return in the source domain by aligning its distribution with that in the target domain.
arXiv Detail & Related papers (2024-10-30T20:46:26Z) - Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning [5.398202201395825]
Decision Transformer (DT) has demonstrated exceptional capabilities in offline reinforcement learning.
Decision ConvFormer (DC) is easier to understand in the context of modeling RL trajectories within a Markov Decision Process.
We propose the Q-value Regularized Decision ConvFormer (QDC), which combines the understanding of RL trajectories by DC and incorporates a term that maximizes action values.
arXiv Detail & Related papers (2024-09-12T14:10:22Z) - Adversarially Robust Decision Transformer [17.49328076347261]
We propose a worst-case-aware RvS algorithm, the Adversarially Robust Decision Transformer (ARDT)
ARDT learns and conditions the policy on in-sample minimax returns-to-go.
In large-scale sequential games and continuous adversarial RL environments, ARDT demonstrates significantly superior robustness to powerful test-time adversaries.
arXiv Detail & Related papers (2024-07-25T22:12:47Z) - Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.
Recent methods aim to mitigate misalignment by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Critic-Guided Decision Transformer for Offline Reinforcement Learning [28.211835303617118]
Critic-Guided Decision Transformer (CGDT)
Uses predictability of long-term returns from value-based methods with the trajectory modeling capability of the Decision Transformer.
Builds upon these insights, we propose a novel approach, which combines the predictability of long-term returns from value-based methods with the trajectory modeling capability of the Decision Transformer.
arXiv Detail & Related papers (2023-12-21T10:29:17Z) - Rethinking Decision Transformer via Hierarchical Reinforcement Learning [54.3596066989024]
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL)
We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL.
We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
arXiv Detail & Related papers (2023-11-01T03:32:13Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.