Related papers: Emergent Agentic Transformer from Chain of Hindsight Experience

Emergent Agentic Transformer from Chain of Hindsight Experience

URL: http://arxiv.org/abs/2305.16554v1
Date: Fri, 26 May 2023 00:43:02 GMT
Title: Emergent Agentic Transformer from Chain of Hindsight Experience
Authors: Hao Liu and Pieter Abbeel
Abstract summary: We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
Score: 96.56164427726203
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large transformer models powered by diverse data and model scale have dominated natural language modeling and computer vision and pushed the frontier of multiple AI areas. In reinforcement learning (RL), despite many efforts into transformer-based policies, a key limitation, however, is that current transformer-based policies cannot learn by directly combining information from multiple sub-optimal trials. In this work, we address this issue using recently proposed chain of hindsight to relabel experience, where we train a transformer on a sequence of trajectory experience ascending sorted according to their total rewards. Our method consists of relabelling target return of each trajectory to the maximum total reward among in sequence of trajectories and training an autoregressive model to predict actions conditioning on past states, actions, rewards, target returns, and task completion tokens, the resulting model, Agentic Transformer (AT), can learn to improve upon itself both at training and test time. As we show on D4RL and ExoRL benchmarks, to the best our knowledge, this is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches, even from sub-optimal data. Our Agentic Transformer also shows a promising scaling trend that bigger models consistently improve results.

Related papers

Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional [2.010573982216398]
Current machine learning approaches use expensive, task-specific, and data-free training procedures.<n>We demonstrate our approach on varied molecular systems, obtaining diverse, physically realistic transition pathways.<n>Our method can be easily incorporated into new generative models, making it practically relevant as models continue to scale.
arXiv Detail & Related papers (2025-04-25T17:17:17Z)
DyTTP: Trajectory Prediction with Normalization-Free Transformers [0.0]
Transformer-based architectures have demonstrated significant promise in capturing complex robustnessity dependencies. We present a two-fold approach to address these challenges. First, we integrate DynamicTanh (DyT), which is the latest method to promote transformers, into the backbone, replacing traditional layer normalization. We are the first work to deploy the DyT to the trajectory prediction task.
arXiv Detail & Related papers (2025-04-07T09:26:25Z)
Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization [88.5582111768376]
We study the optimization of a Transformer composed of a self-attention layer with softmax followed by a fully connected layer under gradient descent on a certain data distribution model. Our results establish a sharp condition that can distinguish between the small test error phase and the large test error regime, based on the signal-to-noise ratio in the data model.
arXiv Detail & Related papers (2024-09-28T13:24:11Z)
Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model. Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z)
On Transforming Reinforcement Learning by Transformer: The Development Trajectory [97.79247023389445]
Transformer, originally devised for natural language processing, has also attested significant success in computer vision. We group existing developments in two categories: architecture enhancement and trajectory optimization. We examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving.
arXiv Detail & Related papers (2022-12-29T03:15:59Z)
Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z)
Transformers for End-to-End InfoSec Tasks: A Feasibility Study [6.847381178288385]
We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files. We show that our URL transformer model requires a different training approach to reach high performance levels. We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets.
arXiv Detail & Related papers (2022-12-05T23:50:46Z)
Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer' With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z)
Stabilizing Transformer-Based Action Sequence Generation For Q-Learning [5.707122938235432]
The goal is a simple Transformer-based Deep Q-Learning method that is stable over several environments. The proposed method can match the performance of classic Q-learning on control environments while showing potential on some selected Atari benchmarks.
arXiv Detail & Related papers (2020-10-23T22:55:04Z)
Gradient-Based Adversarial Training on Transformer Networks for Detecting Check-Worthy Factual Claims [3.7543966923106438]
We introduce the first adversarially-regularized, transformer-based claim spotter model. We obtain a 4.70 point F1-score improvement over current state-of-the-art models. We propose a method to apply adversarial training to transformer models.
arXiv Detail & Related papers (2020-02-18T16:51:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.