Waypoint Transformer: Reinforcement Learning via Supervised Learning
with Intermediate Targets
- URL: http://arxiv.org/abs/2306.14069v2
- Date: Sat, 18 Nov 2023 08:57:40 GMT
- Title: Waypoint Transformer: Reinforcement Learning via Supervised Learning
with Intermediate Targets
- Authors: Anirudhan Badrinath and Yannis Flet-Berliac and Allen Nie and Emma
Brunskill
- Abstract summary: We present a novel approach to enhance RvS methods by integrating intermediate targets.
We introduce the Waypoint Transformer (WT), using an architecture that builds upon the DT framework and conditioned on automatically-generated waypoints.
The results show a significant increase in the final return compared to existing RvS methods, with performance on par or greater than existing state-of-the-art temporal difference learning-based methods.
- Score: 30.044393664203483
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the recent advancements in offline reinforcement learning via
supervised learning (RvS) and the success of the decision transformer (DT)
architecture in various domains, DTs have fallen short in several challenging
benchmarks. The root cause of this underperformance lies in their inability to
seamlessly connect segments of suboptimal trajectories. To overcome this
limitation, we present a novel approach to enhance RvS methods by integrating
intermediate targets. We introduce the Waypoint Transformer (WT), using an
architecture that builds upon the DT framework and conditioned on
automatically-generated waypoints. The results show a significant increase in
the final return compared to existing RvS methods, with performance on par or
greater than existing state-of-the-art temporal difference learning-based
methods. Additionally, the performance and stability improvements are largest
in the most challenging environments and data configurations, including AntMaze
Large Play/Diverse and Kitchen Mixed/Partial.
Related papers
- Predictive Coding for Decision Transformer [21.28952990360392]
Decision transformer (DT) architecture has shown promise across various domains.
Despite its initial success, DTs have underperformed on several challenging datasets in goal-conditioned RL.
We propose the Predictive Coding for Decision Transformer (PCDT) framework, which leverages generalized future conditioning to enhance DT methods.
arXiv Detail & Related papers (2024-10-04T13:17:34Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - DiffClass: Diffusion-Based Class Incremental Learning [30.514281721324853]
Class Incremental Learning (CIL) is challenging due to catastrophic forgetting.
Recent exemplar-free CIL methods attempt to mitigate catastrophic forgetting by synthesizing previous task data.
We propose a novel exemplar-free CIL method to overcome these issues.
arXiv Detail & Related papers (2024-03-08T03:34:18Z) - Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning.
Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing.
We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z) - Delving into Transformer for Incremental Semantic Segmentation [24.811247377533178]
Incremental semantic segmentation (ISS) is an emerging task where old model is updated by adding new classes.
In this work, we propose a Transformer based method for ISS, and accordingly propose TISS.
Under extensive experimental settings, our method significantly outperforms state-of-the-art incremental semantic segmentation methods.
arXiv Detail & Related papers (2022-11-18T14:16:04Z) - Enhancing the Robustness, Efficiency, and Diversity of Differentiable
Architecture Search [25.112048502327738]
Differentiable architecture search (DARTS) has attracted much attention due to its simplicity and significant improvement in efficiency.
Many works attempt to restrict the accumulation of skip connections by indicators or manual design.
We suggest a more subtle and direct approach that removes skip connections from the operation space.
arXiv Detail & Related papers (2022-04-10T13:25:36Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training
for Road Segmentation of Remote Sensing Images [93.50240389540252]
Road segmentation from remote sensing images is a challenging task with wide ranges of application potentials.
We propose a novel stagewise domain adaptation model called RoadDA to address the domain shift (DS) issue in this field.
Experiment results on two benchmarks demonstrate that RoadDA can efficiently reduce the domain gap and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-28T09:29:14Z) - Multi-task Over-the-Air Federated Learning: A Non-Orthogonal
Transmission Approach [52.85647632037537]
We propose a multi-task over-theair federated learning (MOAFL) framework, where multiple learning tasks share edge devices for data collection and learning models under the coordination of a edge server (ES)
Both the convergence analysis and numerical results demonstrate that the MOAFL framework can significantly reduce the uplink bandwidth consumption of multiple tasks without causing substantial learning performance degradation.
arXiv Detail & Related papers (2021-06-27T13:09:32Z) - InverseForm: A Loss Function for Structured Boundary-Aware Segmentation [80.39674800972182]
We present a novel boundary-aware loss term for semantic segmentation using an inverse-transformation network.
This plug-in loss term complements the cross-entropy loss in capturing boundary transformations.
We analyze the quantitative and qualitative effects of our loss function on three indoor and outdoor segmentation benchmarks.
arXiv Detail & Related papers (2021-04-06T18:52:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.