Multi-Game Decision Transformers
- URL: http://arxiv.org/abs/2205.15241v1
- Date: Mon, 30 May 2022 16:55:38 GMT
- Title: Multi-Game Decision Transformers
- Authors: Kuang-Huei Lee, Ofir Nachum, Mengjiao Yang, Lisa Lee, Daniel Freeman,
Winnie Xu, Sergio Guadarrama, Ian Fischer, Eric Jang, Henryk Michalewski,
Igor Mordatch
- Abstract summary: We show that a single transformer-based model can play a suite of up to 46 Atari games simultaneously at close-to-human performance.
We compare several approaches in this multi-game setting, such as online and offline RL methods and behavioral cloning.
We find that our Multi-Game Decision Transformer models offer the best scalability and performance.
- Score: 49.257185338595434
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A longstanding goal of the field of AI is a strategy for compiling diverse
experience into a highly capable, generalist agent. In the subfields of vision
and language, this was largely achieved by scaling up transformer-based models
and training them on large, diverse datasets. Motivated by this progress, we
investigate whether the same strategy can be used to produce generalist
reinforcement learning agents. Specifically, we show that a single
transformer-based model - with a single set of weights - trained purely offline
can play a suite of up to 46 Atari games simultaneously at close-to-human
performance. When trained and evaluated appropriately, we find that the same
trends observed in language and vision hold, including scaling of performance
with model size and rapid adaptation to new games via fine-tuning. We compare
several approaches in this multi-game setting, such as online and offline RL
methods and behavioral cloning, and find that our Multi-Game Decision
Transformer models offer the best scalability and performance. We release the
pre-trained models and code to encourage further research in this direction.
Additional information, videos and code can be seen at:
sites.google.com/view/multi-game-transformers
Related papers
- Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining [49.730897226510095]
We introduce JOWA: Jointly-Reinforced World-Action model, an offline model-based RL agent pretrained on Atari games with 6 billion tokens data.
Our largest agent, with 150 million parameters, 78.9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming existing state-of-the-art large-scale offline RL baselines by 31.6% on averange.
arXiv Detail & Related papers (2024-10-01T10:25:03Z) - Scaling Laws for Imitation Learning in Single-Agent Games [29.941613597833133]
We investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games.
We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack.
We find that IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents.
arXiv Detail & Related papers (2023-07-18T16:43:03Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Instruction-Following Agents with Multimodal Transformer [95.70039658112873]
We propose a simple yet effective model for robots to solve instruction-following tasks in vision-based environments.
Our method consists of a multimodal transformer that encodes visual observations and language instructions.
We show that this unified transformer model outperforms all state-of-the-art pre-trained or trained-from-scratch methods in both single-task and multi-task settings.
arXiv Detail & Related papers (2022-10-24T17:46:47Z) - Probing Transfer in Deep Reinforcement Learning without Task Engineering [26.637254541454773]
We evaluate the use of original game curricula supported by the Atari 2600 console as a heterogeneous transfer benchmark for deep reinforcement learning agents.
Game designers created curricula using combinations of several discrete modifications to the basic versions of games such as Space Invaders, Breakout and Freeway.
We show that zero-shot transfer from the basic games to their variations is possible, but the variance in performance is also largely explained by interactions between factors.
arXiv Detail & Related papers (2022-10-22T13:40:12Z) - Leveraging Transformers for StarCraft Macromanagement Prediction [1.5469452301122177]
We introduce a transformer-based neural architecture for two key StarCraft II macromanagement tasks: global state and build order prediction.
Unlike recurrent neural networks which suffer from a recency bias, transformers are able to capture patterns across very long time horizons.
One key advantage of transformers is their ability to generalize well, and we demonstrate that our model achieves an even better accuracy when used in a transfer learning setting.
arXiv Detail & Related papers (2021-10-11T15:12:21Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.