A Minimalist Prompt for Zero-Shot Policy Learning
- URL: http://arxiv.org/abs/2405.06063v1
- Date: Thu, 9 May 2024 19:15:33 GMT
- Title: A Minimalist Prompt for Zero-Shot Policy Learning
- Authors: Meng Song, Xuezhi Wang, Tanay Biradar, Yao Qin, Manmohan Chandraker,
- Abstract summary: Transformer-based methods have exhibited significant generalization ability when prompted with target-domain demonstrations or example solutions during inference.
We show that conditioning a decision transformer on task parameters alone can enable zero-shot generalization on par with or better than its demonstration-conditioned counterpart.
We introduce an additional learnable prompt which is demonstrated to further boost zero-shot generalization across a range of robotic control, manipulation, and navigation benchmark tasks.
- Score: 61.65128628634916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based methods have exhibited significant generalization ability when prompted with target-domain demonstrations or example solutions during inference. Although demonstrations, as a way of task specification, can capture rich information that may be hard to specify by language, it remains unclear what information is extracted from the demonstrations to help generalization. Moreover, assuming access to demonstrations of an unseen task is impractical or unreasonable in many real-world scenarios, especially in robotics applications. These questions motivate us to explore what the minimally sufficient prompt could be to elicit the same level of generalization ability as the demonstrations. We study this problem in the contextural RL setting which allows for quantitative measurement of generalization and is commonly adopted by meta-RL and multi-task RL benchmarks. In this setting, the training and test Markov Decision Processes (MDPs) only differ in certain properties, which we refer to as task parameters. We show that conditioning a decision transformer on these task parameters alone can enable zero-shot generalization on par with or better than its demonstration-conditioned counterpart. This suggests that task parameters are essential for the generalization and DT models are trying to recover it from the demonstration prompt. To extract the remaining generalizable information from the supervision, we introduce an additional learnable prompt which is demonstrated to further boost zero-shot generalization across a range of robotic control, manipulation, and navigation benchmark tasks.
Related papers
- In-Context Learning with Representations: Contextual Generalization of Trained Transformers [66.78052387054593]
In-context learning (ICL) refers to a capability of pretrained large language models, which can learn a new task given a few examples during inference.
This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.
arXiv Detail & Related papers (2024-08-19T16:47:46Z) - Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Meta-training with Demonstration Retrieval for Efficient Few-shot
Learning [11.723856248352007]
Large language models show impressive results on few-shot NLP tasks.
These models are memory and computation-intensive.
We propose meta-training with demonstration retrieval.
arXiv Detail & Related papers (2023-06-30T20:16:22Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - Reinforcement Learning with Prototypical Representations [114.35801511501639]
Proto-RL is a self-supervised framework that ties representation learning with exploration through prototypical representations.
These prototypes simultaneously serve as a summarization of the exploratory experience of an agent as well as a basis for representing observations.
This enables state-of-the-art downstream policy learning on a set of difficult continuous control tasks.
arXiv Detail & Related papers (2021-02-22T18:56:34Z) - Generalized Hidden Parameter MDPs Transferable Model-based RL in a
Handful of Trials [13.051708608864539]
Generalized Hidden MDPs (GHP-MDPs) describe a family of MDPs where both dynamics and reward can change as a function of hidden parameters that vary across tasks.
We experimentally demonstrate state-of-the-art performance and sample-efficiency on a new challenging MuJoCo task using reward and dynamics latent spaces.
arXiv Detail & Related papers (2020-02-08T02:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.