Related papers: Bootstrapped Transformer for Offline Reinforcement Learning

Bootstrapped Transformer for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2206.08569v1
Date: Fri, 17 Jun 2022 05:57:47 GMT
Title: Bootstrapped Transformer for Offline Reinforcement Learning
Authors: Kerong Wang, Hanye Zhao, Xufang Luo, Kan Ren, Weinan Zhang, Dongsheng Li
Abstract summary: offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment. Recent works provide a novel perspective by viewing offline RL as a generic sequence generation problem. We propose a novel algorithm named Bootstrapped Transformer, which incorporates the idea of bootstrapping and leverages the learned model to self-generate more offline data.
Score: 31.43012728924881
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment. Recent works provide a novel perspective by viewing offline RL as a generic sequence generation problem, adopting sequence models such as Transformer architecture to model distributions over trajectories, and repurposing beam search as a planning algorithm. However, the training datasets utilized in general offline RL tasks are quite limited and often suffer from insufficient distribution coverage, which could be harmful to training sequence generation models yet has not drawn enough attention in the previous works. In this paper, we propose a novel algorithm named Bootstrapped Transformer, which incorporates the idea of bootstrapping and leverages the learned model to self-generate more offline data to further boost the sequence model training. We conduct extensive experiments on two offline RL benchmarks and demonstrate that our model can largely remedy the existing offline RL training limitations and beat other strong baseline methods. We also analyze the generated pseudo data and the revealed characteristics may shed some light on offline RL training. The codes are available at https://seqml.github.io/bootorl.

Related papers

Unsupervised Data Generation for Offline Reinforcement Learning: A Perspective from Model [57.20064815347607]
offline reinforcement learning (RL) recently gains growing interests from RL researchers.<n>The performance of offline RL suffers from the out-of-distribution problem, which can be corrected by feedback in online RL.<n>In this paper, we first build a bridge over the batch data and the performance of offline RL algorithms theoretically.<n>We show that in task-agnostic settings, a series of policies trained by unsupervised RL can minimize the worst-case regret in the performance gap.
arXiv Detail & Related papers (2025-06-24T14:08:36Z)
Scaling Offline RL via Efficient and Expressive Shortcut Models [13.050231036248338]
offline reinforcement learning (RL) remains challenging due to the iterative nature of their noise sampling processes.<n>We introduce Scalable Offline Reinforcement Learning (SORL), a new offline RL algorithm that leverages shortcut models to scale both training and inference.<n>We demonstrate that SORL achieves strong performance across a range of offline RL tasks and exhibits positive scaling behavior with increased test-time compute.
arXiv Detail & Related papers (2025-05-28T20:59:22Z)
Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches [4.364595470673757]
Portfolio Beam Search (PBS) is a simple-yet-effective alternative to Beam Search (BS) We develop an uncertainty-aware diversification mechanism, which we integrate into a sequential decoding algorithm at inference time. We empirically demonstrate the effectiveness of PBS on the D4RL benchmark, where it achieves higher returns and significantly reduces outcome variability.
arXiv Detail & Related papers (2025-02-13T15:51:46Z)
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z)
Offline Trajectory Generalization for Offline Reinforcement Learning [43.89740983387144]
offline reinforcement learning (RL) aims to learn policies from static datasets of previously collected trajectories. We propose offline trajectory generalization through world transformers for offline reinforcement learning (OTTO) OTTO serves as a plug-in module and can be integrated with existing offline RL methods to enhance them with better generalization capability of transformers and high-rewarded data augmentation.
arXiv Detail & Related papers (2024-04-16T08:48:46Z)
Contextual Transformer for Offline Meta Reinforcement Learning [16.587320914107128]
We show how prompts can improve sequence modeling-based offline reinforcement learning ( offline RL) algorithms. We propose prompt tuning for offline RL, where a context vector sequence istextuald with the input to guide the conditional policy generation. We extend our framework to Meta-RL settings and propose Contextual Meta Transformer (CMT); CMT leverages the context among different tasks as the prompt to improve generalization on unseen tasks.
arXiv Detail & Related papers (2022-11-15T10:00:14Z)
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination [31.805991958408438]
We propose to augment the offline dataset by using trained bidirectional dynamics models and rollout policies with double check. Our method, confidence-aware bidirectional offline model-based imagination, generates reliable samples and can be combined with any model-free offline RL method.
arXiv Detail & Related papers (2022-06-16T08:00:44Z)
Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL. ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL. We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z)
Boosting Offline Reinforcement Learning with Residual Generative Modeling [27.50950972741753]
offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game Honor of Kings.
arXiv Detail & Related papers (2021-06-19T03:41:14Z)
Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE) MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary. In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z)
Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. We introduce a new RL formulation for text generation from the soft Q-learning perspective. We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions. We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces. Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z)
MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. We show that an existing model-based RL algorithm already produces significant gains in the offline setting. We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.