Towards Flexible Inference in Sequential Decision Problems via
Bidirectional Transformers
- URL: http://arxiv.org/abs/2204.13326v1
- Date: Thu, 28 Apr 2022 07:50:08 GMT
- Title: Towards Flexible Inference in Sequential Decision Problems via
Bidirectional Transformers
- Authors: Micah Carroll, Jessy Lin, Orr Paradise, Raluca Georgescu, Mingfei Sun,
David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca
Dragan, Sam Devlin
- Abstract summary: We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks.
A single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models.
- Score: 17.09745648221254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Randomly masking and predicting word tokens has been a successful approach in
pre-training language models for a variety of downstream tasks. In this work,
we observe that the same idea also applies naturally to sequential decision
making, where many well-studied tasks like behavior cloning, offline RL,
inverse dynamics, and waypoint conditioning correspond to different sequence
maskings over a sequence of states, actions, and returns. We introduce the
FlexiBiT framework, which provides a unified way to specify models which can be
trained on many different sequential decision making tasks. We show that a
single FlexiBiT model is simultaneously capable of carrying out many tasks with
performance similar to or better than specialized models. Additionally, we show
that performance can be further improved by fine-tuning our general model on
specific tasks of interest.
Related papers
- Regularized Conditional Diffusion Model for Multi-Task Preference Alignment [43.86042557447689]
Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks.
Previous methods formulate it as a conditional generation process, utilizing return-conditioned diffusion models to directly model trajectory distributions.
In this work, we adopt multi-task preferences as a unified condition for both single- and multi-task decision-making.
arXiv Detail & Related papers (2024-04-07T11:20:32Z) - Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently.
Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable.
We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Self-Supervised Reinforcement Learning that Transfers using Random
Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards.
Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - UniMASK: Unified Inference in Sequential Decision Problems [17.09745648221254]
We introduce the UniMASK framework, which provides a unified way to specify models which can be trained on many different sequential decision-making tasks.
A single UniMASK model is often capable of carrying out many tasks with performance similar to or better than single-task models.
arXiv Detail & Related papers (2022-11-20T04:54:49Z) - Multi-Order Networks for Action Unit Detection [7.971065005161565]
Multi-Order Network (MONET) is a multi-task learning method with joint task order optimization.
We show that MONET significantly extends state-of-the-art performance in Facial Action Unit detection.
arXiv Detail & Related papers (2022-02-01T14:58:21Z) - Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models.
We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z) - Conditional Generative Modeling via Learning the Latent Space [54.620761775441046]
We propose a novel framework for conditional generation in multimodal spaces.
It uses latent variables to model generalizable learning patterns.
At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
arXiv Detail & Related papers (2020-10-07T03:11:34Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.