Decoupling Return-to-Go for Efficient Decision Transformer
- URL: http://arxiv.org/abs/2601.15953v1
- Date: Thu, 22 Jan 2026 13:42:08 GMT
- Title: Decoupling Return-to-Go for Efficient Decision Transformer
- Authors: Yongyi Wang, Hanyu Liu, Lingfeng Li, Bozhou Chen, Ang Li, Qirui Zheng, Xionghui Yang, Wenxin Li,
- Abstract summary: Decision Transformer (DT) has established a powerful sequence modeling approach to offline reinforcement learning.<n>Decoupled DT (DDT) simplifies the architecture by processing only observation and action sequences through the Transformer.<n>Our experiments show that DDT significantly outperforms DT and establishes competitive performance against state-of-the-art DT variants.
- Score: 6.429850804144503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Decision Transformer (DT) has established a powerful sequence modeling approach to offline reinforcement learning. It conditions its action predictions on Return-to-Go (RTG), using it both to distinguish trajectory quality during training and to guide action generation at inference. In this work, we identify a critical redundancy in this design: feeding the entire sequence of RTGs into the Transformer is theoretically unnecessary, as only the most recent RTG affects action prediction. We show that this redundancy can impair DT's performance through experiments. To resolve this, we propose the Decoupled DT (DDT). DDT simplifies the architecture by processing only observation and action sequences through the Transformer, using the latest RTG to guide the action prediction. This streamlined approach not only improves performance but also reduces computational cost. Our experiments show that DDT significantly outperforms DT and establishes competitive performance against state-of-the-art DT variants across multiple offline RL tasks.
Related papers
- Adjusting the Output of Decision Transformer with Action Gradient [5.448998267117127]
Action Gradient (AG) is an innovative methodology that directly adjusts actions to fulfill a function analogous to that of PG.<n>AG utilizes the gradient of the Q-value with respect to the action to optimize the action.<n>Our method can significantly enhance the performance of DT-based algorithms, with some results achieving state-of-the-art levels.
arXiv Detail & Related papers (2025-10-06T18:54:42Z) - Large Language Model-Empowered Decision Transformer for UAV-Enabled Data Collection [71.84636717632206]
Unmanned aerial vehicles (UAVs) for reliable and energy-efficient data collection from spatially distributed devices holds great promise in supporting Internet of Things (IoT) applications.<n>We propose a joint language model (LLM) to learn effective UAV control policies.<n>LLM-CRDT outperforms benchmark online and offline methods, achieving up to 36.7% higher energy efficiency than current state-of-the-art DT approaches.
arXiv Detail & Related papers (2025-09-17T13:05:08Z) - Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought [46.71030329872635]
Chain of Thought (CoT) prompting has been shown to significantly improve the performance of large language models (LLMs)<n>We study the training dynamics of transformers over a CoT objective on an in-context weight prediction task for linear regression.
arXiv Detail & Related papers (2025-02-28T16:40:38Z) - Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers [111.78179839856293]
Decision Transformers have emerged as a compelling paradigm for offline Reinforcement Learning (RL)
Online finetuning of decision transformers has been surprisingly under-explored.
We find that simply adding TD3 gradients to the finetuning process of ODT effectively improves the online finetuning performance of ODT.
arXiv Detail & Related papers (2024-10-31T16:38:51Z) - Predictive Coding for Decision Transformer [21.28952990360392]
Decision transformer (DT) architecture has shown promise across various domains.<n>Despite its initial success, DTs have underperformed on several challenging datasets in goal-conditioned RL.<n>We propose the Predictive Coding for Decision Transformer (PCDT) framework, which leverages generalized future conditioning to enhance DT methods.
arXiv Detail & Related papers (2024-10-04T13:17:34Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Generalized Decision Transformer for Offline Hindsight Information
Matching [16.7594941269479]
We present Generalized Decision Transformer (GDT) for solving any hindsight information matching (HIM) problem.
We show how different choices for the feature function and the anti-causal aggregator lead to novel Categorical DT (CDT) and Bi-directional DT (BDT) for matching different statistics of the future.
arXiv Detail & Related papers (2021-11-19T18:56:13Z) - Rethinking Transformer-based Set Prediction for Object Detection [57.7208561353529]
Experimental results show that the proposed methods not only converge much faster than the original DETR, but also significantly outperform DETR and other baselines in terms of detection accuracy.
arXiv Detail & Related papers (2020-11-21T21:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.