Elastic Decision Transformer
- URL: http://arxiv.org/abs/2307.02484v6
- Date: Fri, 20 Oct 2023 05:12:04 GMT
- Title: Elastic Decision Transformer
- Authors: Yueh-Hua Wu, Xiaolong Wang, Masashi Hamaya
- Abstract summary: Elastic Decision Transformer (EDT) is a significant advancement over the existing Decision Transformer (DT)
EDT facilitates trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT.
Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches.
- Score: 18.085153645646646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces Elastic Decision Transformer (EDT), a significant
advancement over the existing Decision Transformer (DT) and its variants.
Although DT purports to generate an optimal trajectory, empirical evidence
suggests it struggles with trajectory stitching, a process involving the
generation of an optimal or near-optimal trajectory from the best parts of a
set of sub-optimal trajectories. The proposed EDT differentiates itself by
facilitating trajectory stitching during action inference at test time,
achieved by adjusting the history length maintained in DT. Further, the EDT
optimizes the trajectory by retaining a longer history when the previous
trajectory is optimal and a shorter one when it is sub-optimal, enabling it to
"stitch" with a more optimal trajectory. Extensive experimentation demonstrates
EDT's ability to bridge the performance gap between DT-based and Q
Learning-based approaches. In particular, the EDT outperforms Q Learning-based
methods in a multi-task regime on the D4RL locomotion benchmark and Atari
games. Videos are available at: https://kristery.github.io/edt/
Related papers
- Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening [56.99266993852532]
Diffusion-Sharpening is a fine-tuning approach that enhances downstream alignment by optimizing sampling trajectories.
Our method demonstrates superior training efficiency with faster convergence, and best inference efficiency without requiring additional NFEs.
arXiv Detail & Related papers (2025-02-17T18:57:26Z) - Fast T2T: Optimization Consistency Speeds Up Diffusion-Based Training-to-Testing Solving for Combinatorial Optimization [83.65278205301576]
We propose to learn direct mappings from different noise levels to the optimal solution for a given instance, facilitating high-quality generation with minimal shots.
This is achieved through an optimization consistency training protocol, which minimizes the difference among samples.
Experiments on two popular tasks, the Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS), demonstrate the superiority of Fast T2T regarding both solution quality and efficiency.
arXiv Detail & Related papers (2025-02-05T07:13:43Z) - DRDT3: Diffusion-Refined Decision Test-Time Training Model [6.907105812732423]
Decision Transformer (DT) has shown competitive performance compared to traditional offline reinforcement learning (RL) approaches.
We introduce a unified framework, called Diffusion-Refined Decision TTT (DRDT3), to achieve performance beyond DT models.
arXiv Detail & Related papers (2025-01-12T04:59:49Z) - Enhancing Decision Transformer with Diffusion-Based Trajectory Branch Generation [29.952637757286073]
Decision Transformer (DT) can learn effective policy from offline datasets by converting the offline reinforcement learning (RL) into a supervised sequence modeling task.
We introduce Diffusion-Based Trajectory Branch Generation (BG), which expands the trajectories of the dataset with branches generated by a diffusion model.
BG outperforms state-of-the-art sequence modeling methods on D4RL benchmark.
arXiv Detail & Related papers (2024-11-18T06:44:14Z) - Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers [111.78179839856293]
Decision Transformers have emerged as a compelling paradigm for offline Reinforcement Learning (RL)
Online finetuning of decision transformers has been surprisingly under-explored.
We find that simply adding TD3 gradients to the finetuning process of ODT effectively improves the online finetuning performance of ODT.
arXiv Detail & Related papers (2024-10-31T16:38:51Z) - Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning [5.398202201395825]
Decision Transformer (DT) has demonstrated exceptional capabilities in offline reinforcement learning.
Decision ConvFormer (DC) is easier to understand in the context of modeling RL trajectories within a Markov Decision Process.
We propose the Q-value Regularized Decision ConvFormer (QDC), which combines the understanding of RL trajectories by DC and incorporates a term that maximizes action values.
arXiv Detail & Related papers (2024-09-12T14:10:22Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning [19.84386060857712]
This paper introduces DiffTORI, which utilizes Differentiable Trajectory optimization as the policy representation to generate actions for deep Reinforcement and Imitation learning.
Across 15 model-based RL tasks and 35 imitation learning tasks with high-dimensional image and point cloud inputs, DiffTORI outperforms prior state-of-the-art methods in both domains.
arXiv Detail & Related papers (2024-02-08T05:26:40Z) - Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference [53.419249906014194]
We study generative modeling for planning with datasets repurposed from offline reinforcement learning.
We introduce the Latent Plan Transformer (), a novel model that leverages a latent variable to connect a Transformer-based trajectory generator and the final return.
arXiv Detail & Related papers (2024-02-07T08:18:09Z) - Context-Former: Stitching via Latent Conditioned Sequence Modeling [31.250234478757665]
We introduce ContextFormer, which integrates contextual information-based imitation learning (IL) and sequence modeling to stitch sub-optimal trajectories.
Experiments show ContextFormer can achieve competitive performance in multiple IL settings.
arXiv Detail & Related papers (2024-01-29T06:05:14Z) - Waypoint Transformer: Reinforcement Learning via Supervised Learning
with Intermediate Targets [30.044393664203483]
We present a novel approach to enhance RvS methods by integrating intermediate targets.
We introduce the Waypoint Transformer (WT), using an architecture that builds upon the DT framework and conditioned on automatically-generated waypoints.
The results show a significant increase in the final return compared to existing RvS methods, with performance on par or greater than existing state-of-the-art temporal difference learning-based methods.
arXiv Detail & Related papers (2023-06-24T22:25:29Z) - Different Tunes Played with Equal Skill: Exploring a Unified
Optimization Subspace for Delta Tuning [95.72622659619445]
Delta tuning (DET) is deemed as the new paradigm for using pre-trained language models (PLMs)
Up to now, various DETs with distinct design elements have been proposed, achieving performance on par with fine-tuning.
arXiv Detail & Related papers (2022-10-24T14:57:35Z) - Feasible Low-thrust Trajectory Identification via a Deep Neural Network
Classifier [1.5076964620370268]
This work proposes a deep neural network (DNN) to accurately identify feasible low thrust transfer prior to the optimization process.
The DNN-classifier achieves an overall accuracy of 97.9%, which has the best performance among the tested algorithms.
arXiv Detail & Related papers (2022-02-10T11:34:37Z) - Event-Based Feature Tracking in Continuous Time with Sliding Window
Optimization [55.11913183006984]
We propose a novel method for continuous-time feature tracking in event cameras.
We track features by aligning events along an estimated trajectory in space-time.
We experimentally confirm that the proposed sliding-window B-spline optimization leads to longer and more accurate feature tracks.
arXiv Detail & Related papers (2021-07-09T16:41:20Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.