Emergent Agentic Transformer from Chain of Hindsight Experience
- URL: http://arxiv.org/abs/2305.16554v1
- Date: Fri, 26 May 2023 00:43:02 GMT
- Title: Emergent Agentic Transformer from Chain of Hindsight Experience
- Authors: Hao Liu and Pieter Abbeel
- Abstract summary: We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
- Score: 96.56164427726203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large transformer models powered by diverse data and model scale have
dominated natural language modeling and computer vision and pushed the frontier
of multiple AI areas. In reinforcement learning (RL), despite many efforts into
transformer-based policies, a key limitation, however, is that current
transformer-based policies cannot learn by directly combining information from
multiple sub-optimal trials. In this work, we address this issue using recently
proposed chain of hindsight to relabel experience, where we train a transformer
on a sequence of trajectory experience ascending sorted according to their
total rewards. Our method consists of relabelling target return of each
trajectory to the maximum total reward among in sequence of trajectories and
training an autoregressive model to predict actions conditioning on past
states, actions, rewards, target returns, and task completion tokens, the
resulting model, Agentic Transformer (AT), can learn to improve upon itself
both at training and test time. As we show on D4RL and ExoRL benchmarks, to the
best our knowledge, this is the first time that a simple transformer-based
model performs competitively with both temporal-difference and
imitation-learning-based approaches, even from sub-optimal data. Our Agentic
Transformer also shows a promising scaling trend that bigger models
consistently improve results.
Related papers
- Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization [88.5582111768376]
We study the optimization of a Transformer composed of a self-attention layer with softmax followed by a fully connected layer under gradient descent on a certain data distribution model.
Our results establish a sharp condition that can distinguish between the small test error phase and the large test error regime, based on the signal-to-noise ratio in the data model.
arXiv Detail & Related papers (2024-09-28T13:24:11Z) - Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model.
Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z) - On Transforming Reinforcement Learning by Transformer: The Development
Trajectory [97.79247023389445]
Transformer, originally devised for natural language processing, has also attested significant success in computer vision.
We group existing developments in two categories: architecture enhancement and trajectory optimization.
We examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving.
arXiv Detail & Related papers (2022-12-29T03:15:59Z) - Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations.
We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z) - Transformers for End-to-End InfoSec Tasks: A Feasibility Study [6.847381178288385]
We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files.
We show that our URL transformer model requires a different training approach to reach high performance levels.
We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets.
arXiv Detail & Related papers (2022-12-05T23:50:46Z) - TransDreamer: Reinforcement Learning with Transformer World Models [30.387428559614186]
We propose a transformer-based Model-Based Reinforcement Learning agent, called TransDreamer.
We first introduce the Transformer State-Space Model, a world model that leverages a transformer for dynamics predictions. We then share this world model with a transformer-based policy network and obtain stability in training a transformer-based RL agent.
In experiments, we apply the proposed model to 2D visual RL and 3D first-person visual RL tasks both requiring long-range memory access for memory-based reasoning. We show that the proposed model outperforms Dreamer in these complex tasks.
arXiv Detail & Related papers (2022-02-19T00:30:52Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Stabilizing Transformer-Based Action Sequence Generation For Q-Learning [5.707122938235432]
The goal is a simple Transformer-based Deep Q-Learning method that is stable over several environments.
The proposed method can match the performance of classic Q-learning on control environments while showing potential on some selected Atari benchmarks.
arXiv Detail & Related papers (2020-10-23T22:55:04Z) - Gradient-Based Adversarial Training on Transformer Networks for
Detecting Check-Worthy Factual Claims [3.7543966923106438]
We introduce the first adversarially-regularized, transformer-based claim spotter model.
We obtain a 4.70 point F1-score improvement over current state-of-the-art models.
We propose a method to apply adversarial training to transformer models.
arXiv Detail & Related papers (2020-02-18T16:51:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.