Contextual Transformer for Offline Meta Reinforcement Learning
- URL: http://arxiv.org/abs/2211.08016v1
- Date: Tue, 15 Nov 2022 10:00:14 GMT
- Title: Contextual Transformer for Offline Meta Reinforcement Learning
- Authors: Runji Lin, Ye Li, Xidong Feng, Zhaowei Zhang, Xian Hong Wu Fung,
Haifeng Zhang, Jun Wang, Yali Du and Yaodong Yang
- Abstract summary: We show how prompts can improve sequence modeling-based offline reinforcement learning ( offline RL) algorithms.
We propose prompt tuning for offline RL, where a context vector sequence istextuald with the input to guide the conditional policy generation.
We extend our framework to Meta-RL settings and propose Contextual Meta Transformer (CMT); CMT leverages the context among different tasks as the prompt to improve generalization on unseen tasks.
- Score: 16.587320914107128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The pretrain-finetuning paradigm in large-scale sequence models has made
significant progress in natural language processing and computer vision tasks.
However, such a paradigm is still hindered by several challenges in
Reinforcement Learning (RL), including the lack of self-supervised pretraining
algorithms based on offline data and efficient fine-tuning/prompt-tuning over
unseen downstream tasks. In this work, we explore how prompts can improve
sequence modeling-based offline reinforcement learning (offline-RL) algorithms.
Firstly, we propose prompt tuning for offline RL, where a context vector
sequence is concatenated with the input to guide the conditional policy
generation. As such, we can pretrain a model on the offline dataset with
self-supervised loss and learn a prompt to guide the policy towards desired
actions. Secondly, we extend our framework to Meta-RL settings and propose
Contextual Meta Transformer (CMT); CMT leverages the context among different
tasks as the prompt to improve generalization on unseen tasks. We conduct
extensive experiments across three different offline-RL settings: offline
single-agent RL on the D4RL dataset, offline Meta-RL on the MuJoCo benchmark,
and offline MARL on the SMAC benchmark. Superior results validate the strong
performance, and generality of our methods.
Related papers
- Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Offline Trajectory Generalization for Offline Reinforcement Learning [43.89740983387144]
offline reinforcement learning (RL) aims to learn policies from static datasets of previously collected trajectories.
We propose offline trajectory generalization through world transformers for offline reinforcement learning (OTTO)
OTTO serves as a plug-in module and can be integrated with existing offline RL methods to enhance them with better generalization capability of transformers and high-rewarded data augmentation.
arXiv Detail & Related papers (2024-04-16T08:48:46Z) - Bootstrapped Transformer for Offline Reinforcement Learning [31.43012728924881]
offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment.
Recent works provide a novel perspective by viewing offline RL as a generic sequence generation problem.
We propose a novel algorithm named Bootstrapped Transformer, which incorporates the idea of bootstrapping and leverages the learned model to self-generate more offline data.
arXiv Detail & Related papers (2022-06-17T05:57:47Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - Offline Meta-Reinforcement Learning with Advantage Weighting [125.21298190780259]
This paper introduces the offline meta-reinforcement learning (offline meta-RL) problem setting and proposes an algorithm that performs well in this setting.
offline meta-RL is analogous to the widely successful supervised learning strategy of pre-training a model on a large batch of fixed, pre-collected data.
We propose Meta-Actor Critic with Advantage Weighting (MACAW), an optimization-based meta-learning algorithm that uses simple, supervised regression objectives for both the inner and outer loop of meta-training.
arXiv Detail & Related papers (2020-08-13T17:57:14Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.