Related papers: Accelerating Transformers in Online RL

Accelerating Transformers in Online RL

URL: http://arxiv.org/abs/2509.26137v1
Date: Tue, 30 Sep 2025 11:57:14 GMT
Title: Accelerating Transformers in Online RL
Authors: Daniil Zelezetsky, Alexey K. Kovalev, Aleksandr I. Panov,
Abstract summary: transformer-based models in Reinforcement Learning (RL)<n>We propose a method that uses the Accelerator policy as a transformer's trainer.<n>We show that applying our algorithm not only enables stable training of transformers but also reduces training time on image-based environments by up to a factor of two.
Score: 47.99822253865053
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The appearance of transformer-based models in Reinforcement Learning (RL) has expanded the horizons of possibilities in robotics tasks, but it has simultaneously brought a wide range of challenges during its implementation, especially in model-free online RL. Some of the existing learning algorithms cannot be easily implemented with transformer-based models due to the instability of the latter. In this paper, we propose a method that uses the Accelerator policy as a transformer's trainer. The Accelerator, a simpler and more stable model, interacts with the environment independently while simultaneously training the transformer through behavior cloning during the first stage of the proposed algorithm. In the second stage, the pretrained transformer starts to interact with the environment in a fully online setting. As a result, this model-free algorithm accelerates the transformer in terms of its performance and helps it to train online in a more stable and faster way. By conducting experiments on both state-based and image-based ManiSkill environments, as well as on MuJoCo tasks in MDP and POMDP settings, we show that applying our algorithm not only enables stable training of transformers but also reduces training time on image-based environments by up to a factor of two. Moreover, it decreases the required replay buffer size in off-policy methods to 10-20 thousand, which significantly lowers the overall computational demands.

Related papers

EcoSpa: Efficient Transformer Training with Coupled Sparsity [79.5008521101473]
Transformers have become the backbone of modern AI, yet their high computational demands pose critical system challenges.<n>We introduce EcoSpa, an efficient structured sparse training method that jointly evaluates and sparsifies coupled weight matrix pairs.
arXiv Detail & Related papers (2025-11-09T11:23:43Z)
CSDformer: A Conversion Method for Fully Spike-Driven Transformer [11.852241487470797]
Spike-based transformer is a novel architecture aiming to enhance the performance of spiking neural networks.<n>We propose CSDformer, a novel conversion method for fully spike-driven transformers.<n>CSDformer achieves high performance under ultra-low latency, while dramatically reducing both computational complexity and training overhead.
arXiv Detail & Related papers (2025-09-22T07:55:03Z)
Quantization-Free Autoregressive Action Transformer [18.499864366974613]
Current transformer-based imitation learning approaches introduce discrete action representations and train an autoregressive transformer decoder on the resulting latent code.<n>We propose a quantization-free method that leverages Generative Infinite-Vocabulary Transformers (GIVT) as a direct, continuous policy parametrization for autoregressive transformers.
arXiv Detail & Related papers (2025-03-18T13:50:35Z)
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining [25.669038513039357]
This paper provides a theoretical framework that analyzes supervised pretraining for in-context reinforcement learning. We show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms.
arXiv Detail & Related papers (2023-10-12T17:55:02Z)
Decision S4: Efficient Sequence-Based RL via State Spaces Layers [87.3063565438089]
We present an off-policy training procedure that works with trajectories, while still maintaining the training efficiency of the S4 model. An on-policy training procedure that is trained in a recurrent manner, benefits from long-range dependencies, and is based on a novel stable actor-critic mechanism.
arXiv Detail & Related papers (2023-06-08T13:03:53Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z)
Stabilizing Transformer-Based Action Sequence Generation For Q-Learning [5.707122938235432]
The goal is a simple Transformer-based Deep Q-Learning method that is stable over several environments. The proposed method can match the performance of classic Q-learning on control environments while showing potential on some selected Atari benchmarks.
arXiv Detail & Related papers (2020-10-23T22:55:04Z)
AutoTrans: Automating Transformer Design via Reinforced Architecture Search [52.48985245743108]
This paper empirically explore how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc, so that one can obtain a transformer architecture that better suits the tasks at hand. Experiments on the CoNLL03, Multi-30k, IWSLT14 and WMT-14 shows that the searched transformer model can outperform the standard transformers.
arXiv Detail & Related papers (2020-09-04T08:46:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.