Related papers: In-Context Reinforcement Learning for Variable Action Spaces

In-Context Reinforcement Learning for Variable Action Spaces

URL: http://arxiv.org/abs/2312.13327v6
Date: Mon, 1 Jul 2024 12:29:58 GMT
Title: In-Context Reinforcement Learning for Variable Action Spaces
Authors: Viacheslav Sinii, Alexander Nikulin, Vladislav Kurenkov, Ilya Zisman, Sergey Kolesnikov,
Abstract summary: Headless-AD is capable of generalizing to discrete action spaces of variable size, semantic content and order. We show that Headless-AD exhibits significant capability to generalize to action spaces it has never encountered.
Score: 46.29510499540938
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context. A key limitation of previously proposed models is their reliance on a predefined action space size and structure. The introduction of a new action space often requires data re-collection and model re-training, which can be costly for some applications. In our work, we show that it is possible to mitigate this issue by proposing the Headless-AD model that, despite being trained only once, is capable of generalizing to discrete action spaces of variable size, semantic content and order. By experimenting with Bernoulli and contextual bandits, as well as a gridworld environment, we show that Headless-AD exhibits significant capability to generalize to action spaces it has never encountered, even outperforming specialized models trained for a specific set of actions on several environment configurations. Implementation is available at: https://github.com/corl-team/headless-ad.

Related papers

EoS-FM: Can an Ensemble of Specialist Models act as a Generalist Feature Extractor? [8.178030486012437]
We present an Ensemble-of-Specialists framework for building Remote Sensing Foundation Models (RSFMs)<n>Our method decomposes the training process into lightweight, task-specific ConvNeXtV2 specialists that can be frozen and reused.<n>Our framework sets a new direction for building scalable and efficient RSFMs.
arXiv Detail & Related papers (2025-11-26T15:52:56Z)
Personalized Vision via Visual In-Context Learning [62.85784251383279]
We present a visual in-context learning framework for personalized vision.<n>PICO infers the underlying transformation and applies it to new inputs without retraining.<n>We also propose an attention-guided seed scorer that improves reliability via efficient inference scaling.
arXiv Detail & Related papers (2025-09-29T17:58:45Z)
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines [64.84631333071728]
We introduce bfUnistage, a unified Transformer-based framework fortemporal modeling. Our work demonstrates that a task-specific vision-text can build a generalizable model fortemporal learning. We also introduce a temporal module to incorporate temporal dynamics explicitly.
arXiv Detail & Related papers (2025-03-26T17:33:23Z)
Mapping representations in Reinforcement Learning via Semantic Alignment for Zero-Shot Stitching [17.76990521486307]
Deep Reinforcement Learning models often fail to generalize when even small changes occur in the environment's observations or task requirements. We propose a zero-shot method for mapping between latent spaces across different agents trained on different visual and task variations. We empirically demonstrate zero-shot stitching performance on the CarRacing environment with changing background and task.
arXiv Detail & Related papers (2025-02-26T22:06:00Z)
Test-Time Alignment via Hypothesis Reweighting [56.71167047381817]
Large pretrained models often struggle with underspecified tasks. We propose a novel framework to address the challenge of aligning models to test-time user intent.
arXiv Detail & Related papers (2024-12-11T23:02:26Z)
A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language. To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates. We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z)
POA: Pre-training Once for Models of All Sizes [33.72644336390202]
We propose a novel tri-branch self-supervised training framework, termed as POA (Pre-training Once for All) Our approach introduces an innovative elastic student branch into a modern self-distillation paradigm. It achieves state-of-the-art performance using ViT, Swin Transformer and ResNet backbones.
arXiv Detail & Related papers (2024-08-02T06:13:29Z)
Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning [65.57123249246358]
We propose ExpAndable Subspace Ensemble (EASE) for PTM-based CIL. We train a distinct lightweight adapter module for each new task, aiming to create task-specific subspaces. Our prototype complement strategy synthesizes old classes' new features without using any old class instance.
arXiv Detail & Related papers (2024-03-18T17:58:13Z)
Generalization to New Sequential Decision Making Tasks with In-Context Learning [23.36106067650874]
Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. In this paper, we show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks. We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environmentity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks.
arXiv Detail & Related papers (2023-12-06T15:19:28Z)
Building a Subspace of Policies for Scalable Continual Learning [21.03369477853538]
We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks. CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation)
arXiv Detail & Related papers (2022-11-18T14:59:42Z)
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks. We find that their performances are sub-optimal or even lag far behind the single-task baseline. We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z)
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming. We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task. We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z)
Conditional Generative Modeling via Learning the Latent Space [54.620761775441046]
We propose a novel framework for conditional generation in multimodal spaces. It uses latent variables to model generalizable learning patterns. At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
arXiv Detail & Related papers (2020-10-07T03:11:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.