Related papers: DRDT3: Diffusion-Refined Decision Test-Time Training Model

DRDT3: Diffusion-Refined Decision Test-Time Training Model

URL: http://arxiv.org/abs/2501.06718v1
Date: Sun, 12 Jan 2025 04:59:49 GMT
Title: DRDT3: Diffusion-Refined Decision Test-Time Training Model
Authors: Xingshuai Huang, Di Wu, Benoit Boulet,
Abstract summary: Decision Transformer (DT) has shown competitive performance compared to traditional offline reinforcement learning (RL) approaches.<n>We introduce a unified framework, called Diffusion-Refined Decision TTT (DRDT3), to achieve performance beyond DT models.
Score: 6.907105812732423
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decision Transformer (DT), a trajectory modeling method, has shown competitive performance compared to traditional offline reinforcement learning (RL) approaches on various classic control tasks. However, it struggles to learn optimal policies from suboptimal, reward-labeled trajectories. In this study, we explore the use of conditional generative modeling to facilitate trajectory stitching given its high-quality data generation ability. Additionally, recent advancements in Recurrent Neural Networks (RNNs) have shown their linear complexity and competitive sequence modeling performance over Transformers. We leverage the Test-Time Training (TTT) layer, an RNN that updates hidden states during testing, to model trajectories in the form of DT. We introduce a unified framework, called Diffusion-Refined Decision TTT (DRDT3), to achieve performance beyond DT models. Specifically, we propose the Decision TTT (DT3) module, which harnesses the sequence modeling strengths of both self-attention and the TTT layer to capture recent contextual information and make coarse action predictions. We further integrate DT3 with the diffusion model using a unified optimization objective. With experiments on multiple tasks of Gym and AntMaze in the D4RL benchmark, our DT3 model without diffusion refinement demonstrates improved performance over standard DT, while DRDT3 further achieves superior results compared to state-of-the-art conventional offline RL and DT-based methods.

Related papers

CTA: Cross-Task Alignment for Better Test Time Training [10.54024648915477]
Test-Time Training (TTT) has emerged as an effective method to enhance model robustness.<n>We introduce CTA (Cross-Task Alignment), a novel approach for improving TTT.<n>We show substantial improvements in robustness and generalization over the state-of-the-art on several benchmark datasets.
arXiv Detail & Related papers (2025-07-07T17:33:20Z)
T3DM: Test-Time Training-Guided Distribution Shift Modelling for Temporal Knowledge Graph Reasoning [3.2186308082558632]
Temporal Knowledge Graph (TKG) is an efficient method for describing the dynamic development of facts along a timeline.<n>We propose a novel distributional feature modeling approach for training TKGR models, Test-Time Training-guided Distribution shift Modelling (T3DM)<n>In addition, we design a negative-sampling strategy to generate higher-quality negative quadruples based on adversarial training.
arXiv Detail & Related papers (2025-07-02T11:02:37Z)
Test-Time Training Provably Improves Transformers as In-context Learners [49.09821664572445]
We investigate a gradient-based TTT algorithm for in-context learning. We train a transformer model on the in-context demonstrations provided in the test prompt. As our empirical contribution, we study the benefits of TTT for TabPFN.
arXiv Detail & Related papers (2025-03-14T20:06:37Z)
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images.<n>Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z)
FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow [17.919092916953183]
We propose a novel framework, named FlowDreamer, which yields high fidelity results with richer textual details and faster convergence. Key insight is to leverage the coupling and reversible properties of the rectified flow model to search for the corresponding noise. We introduce a novel Unique Matching Couple (UCM) loss, which guides the 3D model to optimize along the same trajectory.
arXiv Detail & Related papers (2024-08-09T11:40:20Z)
Dual Test-time Training for Out-of-distribution Recommender System [91.15209066874694]
We propose a novel Dual Test-Time-Training framework for OOD Recommendation, termed DT3OR. In DT3OR, we incorporate a model adaptation mechanism during the test-time phase to carefully update the recommendation model. To the best of our knowledge, this paper is the first work to address OOD recommendation via a test-time-training strategy.
arXiv Detail & Related papers (2024-07-22T13:27:51Z)
Context-Former: Stitching via Latent Conditioned Sequence Modeling [31.250234478757665]
We introduce ContextFormer, which integrates contextual information-based imitation learning (IL) and sequence modeling to stitch sub-optimal trajectories. Experiments show ContextFormer can achieve competitive performance in multiple IL settings.
arXiv Detail & Related papers (2024-01-29T06:05:14Z)
Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning. Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing. We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z)
Learn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z)
Bridging Sensor Gaps via Attention Gated Tuning for Hyperspectral Image Classification [9.82907639745345]
HSI classification methods require high-quality labeled HSIs, which are often costly to obtain. We propose a novel Attention-Gated Tuning (AGT) strategy and a triplet-structured transformer model, Tri-Former, to address this issue.
arXiv Detail & Related papers (2023-09-22T13:39:24Z)
Diffusion-based 3D Object Detection with Random Boxes [58.43022365393569]
Existing anchor-based 3D detection methods rely on empiricals setting of anchors, which makes the algorithms lack elegance. Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets. In the inference stage, the model progressively refines a set of random boxes to the prediction results.
arXiv Detail & Related papers (2023-09-05T08:49:53Z)
Elastic Decision Transformer [18.085153645646646]
Elastic Decision Transformer (EDT) is a significant advancement over the existing Decision Transformer (DT) EDT facilitates trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches.
arXiv Detail & Related papers (2023-07-05T17:58:21Z)
Truncated tensor Schatten p-norm based approach for spatiotemporal traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers. Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z)
Generalized Decision Transformer for Offline Hindsight Information Matching [16.7594941269479]
We present Generalized Decision Transformer (GDT) for solving any hindsight information matching (HIM) problem. We show how different choices for the feature function and the anti-causal aggregator lead to novel Categorical DT (CDT) and Bi-directional DT (BDT) for matching different statistics of the future.
arXiv Detail & Related papers (2021-11-19T18:56:13Z)
Generating Synthetic Training Data for Deep Learning-Based UAV Trajectory Prediction [11.241614693184323]
We present an approach for generating synthetic trajectory data of unmanned-aerial-vehicles (UAVs) in image space. We show that an RNN-based prediction model solely trained on the generated data can outperform classic reference models on a real-world UAV tracking dataset.
arXiv Detail & Related papers (2021-07-01T13:08:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.