Decision S4: Efficient Sequence-Based RL via State Spaces Layers
- URL: http://arxiv.org/abs/2306.05167v1
- Date: Thu, 8 Jun 2023 13:03:53 GMT
- Title: Decision S4: Efficient Sequence-Based RL via State Spaces Layers
- Authors: Shmuel Bar-David, Itamar Zimerman, Eliya Nachmani, Lior Wolf
- Abstract summary: We present an off-policy training procedure that works with trajectories, while still maintaining the training efficiency of the S4 model.
An on-policy training procedure that is trained in a recurrent manner, benefits from long-range dependencies, and is based on a novel stable actor-critic mechanism.
- Score: 87.3063565438089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, sequence learning methods have been applied to the problem of
off-policy Reinforcement Learning, including the seminal work on Decision
Transformers, which employs transformers for this task. Since transformers are
parameter-heavy, cannot benefit from history longer than a fixed window size,
and are not computed using recurrence, we set out to investigate the
suitability of the S4 family of models, which are based on state-space layers
and have been shown to outperform transformers, especially in modeling
long-range dependencies. In this work we present two main algorithms: (i) an
off-policy training procedure that works with trajectories, while still
maintaining the training efficiency of the S4 model. (ii) An on-policy training
procedure that is trained in a recurrent manner, benefits from long-range
dependencies, and is based on a novel stable actor-critic mechanism. Our
results indicate that our method outperforms multiple variants of decision
transformers, as well as the other baseline methods on most tasks, while
reducing the latency, number of parameters, and training time by several orders
of magnitude, making our approach more suitable for real-world RL.
Related papers
- Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - Dynamic Layer Tying for Parameter-Efficient Transformers [65.268245109828]
We employ Reinforcement Learning to select layers during training and tie them together.
This facilitates weight sharing, reduces the number of trainable parameters, and also serves as an effective regularization technique.
In particular, the memory consumption during training is up to one order of magnitude less than the conventional training method.
arXiv Detail & Related papers (2024-01-23T14:53:20Z) - On the Effectiveness of LayerNorm Tuning for Continual Learning in
Vision Transformers [47.77328392236625]
State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts.
We introduce a two-stage training procedure, where we first optimize the task-specific parameters and then train the classifier with the same selection procedure of the inference time.
Our method achieves results that are either superior or on par with the state of the art while being computationally cheaper.
arXiv Detail & Related papers (2023-08-18T15:11:16Z) - Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems.
We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action.
We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Q-learning Decision Transformer: Leveraging Dynamic Programming for
Conditional Sequence Modelling in Offline RL [0.0]
Decision Transformer (DT) combines the conditional policy approach and a transformer architecture.
DT lacks stitching ability -- one of the critical abilities for offline RL to learn the optimal policy.
We propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT.
arXiv Detail & Related papers (2022-09-08T18:26:39Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - A Practical Survey on Faster and Lighter Transformers [0.9176056742068811]
The Transformer is a model solely based on the attention mechanism that is able to relate any two positions of the input sequence.
It has improved the state-of-the-art across numerous sequence modelling tasks.
However, its effectiveness comes at the expense of a quadratic computational and memory complexity with respect to the sequence length.
arXiv Detail & Related papers (2021-03-26T17:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.