Rethinking Decision Transformer via Hierarchical Reinforcement Learning
- URL: http://arxiv.org/abs/2311.00267v1
- Date: Wed, 1 Nov 2023 03:32:13 GMT
- Title: Rethinking Decision Transformer via Hierarchical Reinforcement Learning
- Authors: Yi Ma, Chenjun Xiao, Hebin Liang, Jianye Hao
- Abstract summary: Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL)
We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL.
We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
- Score: 54.3596066989024
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Decision Transformer (DT) is an innovative algorithm leveraging recent
advances of the transformer architecture in reinforcement learning (RL).
However, a notable limitation of DT is its reliance on recalling trajectories
from datasets, losing the capability to seamlessly stitch sub-optimal
trajectories together. In this work we introduce a general sequence modeling
framework for studying sequential decision making through the lens of
Hierarchical RL. At the time of making decisions, a high-level policy first
proposes an ideal prompt for the current state, a low-level policy subsequently
generates an action conditioned on the given prompt. We show DT emerges as a
special case of this framework with certain choices of high-level and low-level
policies, and discuss the potential failure of these choices. Inspired by these
observations, we study how to jointly optimize the high-level and low-level
policies to enable the stitching ability, which further leads to the
development of new offline RL algorithms. Our empirical results clearly show
that the proposed algorithms significantly surpass DT on several control and
navigation benchmarks. We hope our contributions can inspire the integration of
transformer architectures within the field of RL.
Related papers
- Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning [26.915055027485465]
We study offline off-dynamics reinforcement learning (RL) to enhance policy learning in a target domain with limited data.
Our approach centers on return-conditioned supervised learning (RCSL), particularly focusing on the decision transformer (DT)
We propose the Return Augmented Decision Transformer (RADT) method, where we augment the return in the source domain by aligning its distribution with that in the target domain.
arXiv Detail & Related papers (2024-10-30T20:46:26Z) - Predictive Coding for Decision Transformer [21.28952990360392]
Decision transformer (DT) architecture has shown promise across various domains.
Despite its initial success, DTs have underperformed on several challenging datasets in goal-conditioned RL.
We propose the Predictive Coding for Decision Transformer (PCDT) framework, which leverages generalized future conditioning to enhance DT methods.
arXiv Detail & Related papers (2024-10-04T13:17:34Z) - Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.
Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.
Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - On Transforming Reinforcement Learning by Transformer: The Development
Trajectory [97.79247023389445]
Transformer, originally devised for natural language processing, has also attested significant success in computer vision.
We group existing developments in two categories: architecture enhancement and trajectory optimization.
We examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving.
arXiv Detail & Related papers (2022-12-29T03:15:59Z) - Hyperbolic Deep Reinforcement Learning [8.983647543608226]
We propose a new class of deep reinforcement learning algorithms that model latent representations in hyperbolic space.
We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks.
arXiv Detail & Related papers (2022-10-04T12:03:04Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Learning Off-Policy with Online Planning [18.63424441772675]
We investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function.
We show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments.
arXiv Detail & Related papers (2020-08-23T16:18:44Z) - Optimization-driven Deep Reinforcement Learning for Robust Beamforming
in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver.
We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming.
We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.