Online Decision Transformer
- URL: http://arxiv.org/abs/2202.05607v1
- Date: Fri, 11 Feb 2022 13:43:24 GMT
- Title: Online Decision Transformer
- Authors: Qinqing Zheng, Amy Zhang, Aditya Grover
- Abstract summary: offline reinforcement learning (RL) can be formulated as a sequence modeling problem.
Online Decision Transformers (ODT) is an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning.
- Score: 30.54774566089644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has shown that offline reinforcement learning (RL) can be
formulated as a sequence modeling problem (Chen et al., 2021; Janner et al.,
2021) and solved via approaches similar to large-scale language modeling.
However, any practical instantiation of RL also involves an online component,
where policies pretrained on passive offline datasets are finetuned via
taskspecific interactions with the environment. We propose Online Decision
Transformers (ODT), an RL algorithm based on sequence modeling that blends
offline pretraining with online finetuning in a unified framework. Our
framework uses sequence-level entropy regularizers in conjunction with
autoregressive modeling objectives for sample-efficient exploration and
finetuning. Empirically, we show that ODT is competitive with the
state-of-the-art in absolute performance on the D4RL benchmark but shows much
more significant gains during the finetuning procedure.
Related papers
- Offline Trajectory Generalization for Offline Reinforcement Learning [43.89740983387144]
offline reinforcement learning (RL) aims to learn policies from static datasets of previously collected trajectories.
We propose offline trajectory generalization through world transformers for offline reinforcement learning (OTTO)
OTTO serves as a plug-in module and can be integrated with existing offline RL methods to enhance them with better generalization capability of transformers and high-rewarded data augmentation.
arXiv Detail & Related papers (2024-04-16T08:48:46Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Multi-Objective Decision Transformers for Offline Reinforcement Learning [7.386356540208436]
offline RL is structured to derive policies from static trajectory data without requiring real-time environment interactions.
We reformulate offline RL as a multi-objective optimization problem, where prediction is extended to states and returns.
Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model.
arXiv Detail & Related papers (2023-08-31T00:47:58Z) - Graph Decision Transformer [83.76329715043205]
Graph Decision Transformer (GDT) is a novel offline reinforcement learning approach.
GDT models the input sequence into a causal graph to capture potential dependencies between fundamentally different concepts.
Our experiments show that GDT matches or surpasses the performance of state-of-the-art offline RL methods on image-based Atari and OpenAI Gym.
arXiv Detail & Related papers (2023-03-07T09:10:34Z) - Contextual Transformer for Offline Meta Reinforcement Learning [16.587320914107128]
We show how prompts can improve sequence modeling-based offline reinforcement learning ( offline RL) algorithms.
We propose prompt tuning for offline RL, where a context vector sequence istextuald with the input to guide the conditional policy generation.
We extend our framework to Meta-RL settings and propose Contextual Meta Transformer (CMT); CMT leverages the context among different tasks as the prompt to improve generalization on unseen tasks.
arXiv Detail & Related papers (2022-11-15T10:00:14Z) - Adaptive Behavior Cloning Regularization for Stable Offline-to-Online
Reinforcement Learning [80.25648265273155]
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment.
During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data.
We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability.
Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
arXiv Detail & Related papers (2022-10-25T09:08:26Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models.
We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.