In-Context Reinforcement Learning for Variable Action Spaces
- URL: http://arxiv.org/abs/2312.13327v6
- Date: Mon, 1 Jul 2024 12:29:58 GMT
- Title: In-Context Reinforcement Learning for Variable Action Spaces
- Authors: Viacheslav Sinii, Alexander Nikulin, Vladislav Kurenkov, Ilya Zisman, Sergey Kolesnikov,
- Abstract summary: Headless-AD is capable of generalizing to discrete action spaces of variable size, semantic content and order.
We show that Headless-AD exhibits significant capability to generalize to action spaces it has never encountered.
- Score: 46.29510499540938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context. A key limitation of previously proposed models is their reliance on a predefined action space size and structure. The introduction of a new action space often requires data re-collection and model re-training, which can be costly for some applications. In our work, we show that it is possible to mitigate this issue by proposing the Headless-AD model that, despite being trained only once, is capable of generalizing to discrete action spaces of variable size, semantic content and order. By experimenting with Bernoulli and contextual bandits, as well as a gridworld environment, we show that Headless-AD exhibits significant capability to generalize to action spaces it has never encountered, even outperforming specialized models trained for a specific set of actions on several environment configurations. Implementation is available at: https://github.com/corl-team/headless-ad.
Related papers
- A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.
To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.
We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z) - POA: Pre-training Once for Models of All Sizes [33.72644336390202]
We propose a novel tri-branch self-supervised training framework, termed as POA (Pre-training Once for All)
Our approach introduces an innovative elastic student branch into a modern self-distillation paradigm.
It achieves state-of-the-art performance using ViT, Swin Transformer and ResNet backbones.
arXiv Detail & Related papers (2024-08-02T06:13:29Z) - Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning [65.57123249246358]
We propose ExpAndable Subspace Ensemble (EASE) for PTM-based CIL.
We train a distinct lightweight adapter module for each new task, aiming to create task-specific subspaces.
Our prototype complement strategy synthesizes old classes' new features without using any old class instance.
arXiv Detail & Related papers (2024-03-18T17:58:13Z) - Generalization to New Sequential Decision Making Tasks with In-Context
Learning [23.36106067650874]
Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning.
In this paper, we show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks.
We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environmentity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks.
arXiv Detail & Related papers (2023-12-06T15:19:28Z) - Building a Subspace of Policies for Scalable Continual Learning [21.03369477853538]
We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks.
CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation)
arXiv Detail & Related papers (2022-11-18T14:59:42Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z) - Conditional Generative Modeling via Learning the Latent Space [54.620761775441046]
We propose a novel framework for conditional generation in multimodal spaces.
It uses latent variables to model generalizable learning patterns.
At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
arXiv Detail & Related papers (2020-10-07T03:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.