Skill Decision Transformer
- URL: http://arxiv.org/abs/2301.13573v1
- Date: Tue, 31 Jan 2023 11:52:46 GMT
- Title: Skill Decision Transformer
- Authors: Shyam Sudhakaran and Sebastian Risi
- Abstract summary: Large Language Models (LLMs) can be incredibly effective for offline reinforcement learning (RL)
Generalized Decision Transformers (GDTs) have shown that utilizing future trajectory information, in the form of information statistics, can help extract more information from offline trajectory data.
We show that Skill DT can not only perform offline state-marginal matching (SMM), but can discovery descriptive behaviors that can be easily sampled.
- Score: 9.387749254963595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has shown that Large Language Models (LLMs) can be incredibly
effective for offline reinforcement learning (RL) by representing the
traditional RL problem as a sequence modelling problem (Chen et al., 2021;
Janner et al., 2021). However many of these methods only optimize for high
returns, and may not extract much information from a diverse dataset of
trajectories. Generalized Decision Transformers (GDTs) (Furuta et al., 2021)
have shown that utilizing future trajectory information, in the form of
information statistics, can help extract more information from offline
trajectory data. Building upon this, we propose Skill Decision Transformer
(Skill DT). Skill DT draws inspiration from hindsight relabelling (Andrychowicz
et al., 2017) and skill discovery methods to discover a diverse set of
primitive behaviors, or skills. We show that Skill DT can not only perform
offline state-marginal matching (SMM), but can discovery descriptive behaviors
that can be easily sampled. Furthermore, we show that through purely
reward-free optimization, Skill DT is still competitive with supervised offline
RL approaches on the D4RL benchmark. The code and videos can be found on our
project page: https://github.com/shyamsn97/skill-dt
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining [49.730897226510095]
We introduce JOWA: Jointly-Reinforced World-Action model, an offline model-based RL agent pretrained on Atari games with 6 billion tokens data.
Our largest agent, with 150 million parameters, 78.9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming existing state-of-the-art large-scale offline RL baselines by 31.6% on averange.
arXiv Detail & Related papers (2024-10-01T10:25:03Z) - EXTRACT: Efficient Policy Learning by Extracting Transferable Robot Skills from Offline Data [22.471559284344462]
Most reinforcement learning (RL) methods focus on learning optimal policies over low-level action spaces.
While these methods can perform well in their training environments, they lack the flexibility to transfer to new tasks.
We demonstrate through experiments in sparse, image-based, robot manipulation environments that can more quickly learn new tasks than prior works.
arXiv Detail & Related papers (2024-06-25T17:50:03Z) - Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning.
Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing.
We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z) - When should we prefer Decision Transformers for Offline Reinforcement
Learning? [29.107029606830015]
Three popular algorithms for offline RL are Conservative Q-Learning (CQL), Behavior Cloning (BC), and Decision Transformer (DT)
We study this question empirically by exploring the performance of these algorithms across the commonly used D4RL and Robomimicity benchmarks.
We find that scaling the amount of data for DT by 5x gives a 2.5x average score improvement on Atari.
arXiv Detail & Related papers (2023-05-23T22:19:14Z) - Offline Q-Learning on Diverse Multi-Task Data Both Scales And
Generalizes [100.69714600180895]
offline Q-learning algorithms exhibit strong performance that scales with model capacity.
We train a single policy on 40 games with near-human performance using up-to 80 million parameter networks.
Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal.
arXiv Detail & Related papers (2022-11-28T08:56:42Z) - Bootstrapped Transformer for Offline Reinforcement Learning [31.43012728924881]
offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment.
Recent works provide a novel perspective by viewing offline RL as a generic sequence generation problem.
We propose a novel algorithm named Bootstrapped Transformer, which incorporates the idea of bootstrapping and leverages the learned model to self-generate more offline data.
arXiv Detail & Related papers (2022-06-17T05:57:47Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - Generalized Decision Transformer for Offline Hindsight Information
Matching [16.7594941269479]
We present Generalized Decision Transformer (GDT) for solving any hindsight information matching (HIM) problem.
We show how different choices for the feature function and the anti-causal aggregator lead to novel Categorical DT (CDT) and Bi-directional DT (BDT) for matching different statistics of the future.
arXiv Detail & Related papers (2021-11-19T18:56:13Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.