Adaptive Transformers in RL
- URL: http://arxiv.org/abs/2004.03761v1
- Date: Wed, 8 Apr 2020 01:03:10 GMT
- Title: Adaptive Transformers in RL
- Authors: Shakti Kumar, Jerrod Parker, Panteha Naderian
- Abstract summary: Recent developments in Transformers have opened new areas of research in partially observable reinforcement learning tasks.
Results from late 2019 showed that Transformers are able to outperform LSTMs on both memory intense and reactive tasks.
- Score: 6.292138336765965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent developments in Transformers have opened new interesting areas of
research in partially observable reinforcement learning tasks. Results from
late 2019 showed that Transformers are able to outperform LSTMs on both memory
intense and reactive tasks. In this work we first partially replicate the
results shown in Stabilizing Transformers in RL on both reactive and memory
based environments. We then show performance improvement coupled with reduced
computation when adding adaptive attention span to this Stable Transformer on a
challenging DMLab30 environment. The code for all our experiments and models is
available at https://github.com/jerrodparker20/adaptive-transformers-in-rl.
Related papers
- Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization [88.5582111768376]
We study the optimization of a Transformer composed of a self-attention layer with softmax followed by a fully connected layer under gradient descent on a certain data distribution model.
Our results establish a sharp condition that can distinguish between the small test error phase and the large test error regime, based on the signal-to-noise ratio in the data model.
arXiv Detail & Related papers (2024-09-28T13:24:11Z) - TransformerFAM: Feedback attention is working memory [18.005034679674274]
We propose a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations.
TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models.
arXiv Detail & Related papers (2024-04-14T07:43:45Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - A Survey on Transformers in Reinforcement Learning [66.23773284875843]
Transformer has been considered the dominating neural architecture in NLP and CV, mostly under supervised settings.
Recently, a similar surge of using Transformers has appeared in the domain of reinforcement learning (RL), but it is faced with unique design choices and challenges brought by the nature of RL.
This paper systematically reviews motivations and progress on using Transformers in RL, provide a taxonomy on existing works, discuss each sub-field, and summarize future prospects.
arXiv Detail & Related papers (2023-01-08T14:04:26Z) - Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations.
We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z) - Towards Lightweight Transformer via Group-wise Transformation for
Vision-and-Language Tasks [126.33843752332139]
We introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer.
We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets.
Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks.
arXiv Detail & Related papers (2022-04-16T11:30:26Z) - Gaze Estimation using Transformer [14.26674946195107]
We consider two forms of vision transformer which are pure transformers and hybrid transformers.
We first follow the popular ViT and employ a pure transformer to estimate gaze from images.
On the other hand, we preserve the convolutional layers and integrate CNNs as well as transformers.
arXiv Detail & Related papers (2021-05-30T04:06:29Z) - Stabilizing Transformer-Based Action Sequence Generation For Q-Learning [5.707122938235432]
The goal is a simple Transformer-based Deep Q-Learning method that is stable over several environments.
The proposed method can match the performance of classic Q-learning on control environments while showing potential on some selected Atari benchmarks.
arXiv Detail & Related papers (2020-10-23T22:55:04Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.