Facing Off World Model Backbones: RNNs, Transformers, and S4
- URL: http://arxiv.org/abs/2307.02064v2
- Date: Thu, 9 Nov 2023 16:50:43 GMT
- Title: Facing Off World Model Backbones: RNNs, Transformers, and S4
- Authors: Fei Deng, Junyeong Park, Sungjin Ahn
- Abstract summary: World models are a fundamental component in model-based reinforcement learning (MBRL)
We propose S4WM, the first world model compatible with parallelizable SSMs including S4 and its variants.
Our findings demonstrate that S4WM outperforms Transformer-based world models in terms of long-term memory, while exhibiting greater efficiency during training and imagination.
- Score: 24.818868307093766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: World models are a fundamental component in model-based reinforcement
learning (MBRL). To perform temporally extended and consistent simulations of
the future in partially observable environments, world models need to possess
long-term memory. However, state-of-the-art MBRL agents, such as Dreamer,
predominantly employ recurrent neural networks (RNNs) as their world model
backbone, which have limited memory capacity. In this paper, we seek to explore
alternative world model backbones for improving long-term memory. In
particular, we investigate the effectiveness of Transformers and Structured
State Space Sequence (S4) models, motivated by their remarkable ability to
capture long-range dependencies in low-dimensional sequences and their
complementary strengths. We propose S4WM, the first world model compatible with
parallelizable SSMs including S4 and its variants. By incorporating latent
variable modeling, S4WM can efficiently generate high-dimensional image
sequences through latent imagination. Furthermore, we extensively compare RNN-,
Transformer-, and S4-based world models across four sets of environments, which
we have tailored to assess crucial memory capabilities of world models,
including long-term imagination, context-dependent recall, reward prediction,
and memory-based reasoning. Our findings demonstrate that S4WM outperforms
Transformer-based world models in terms of long-term memory, while exhibiting
greater efficiency during training and imagination. These results pave the way
for the development of stronger MBRL agents.
Related papers
- Automatically Learning Hybrid Digital Twins of Dynamical Systems [56.69628749813084]
Digital Twins (DTs) simulate the states and temporal dynamics of real-world systems.
DTs often struggle to generalize to unseen conditions in data-scarce settings.
In this paper, we propose an evolutionary algorithm ($textbfHDTwinGen$) to autonomously propose, evaluate, and optimize HDTwins.
arXiv Detail & Related papers (2024-10-31T07:28:22Z) - FACTS: A Factored State-Space Framework For World Modelling [24.08175276756845]
We propose a novel recurrent framework, the textbfFACTored textbfState-space (textbfFACTS) model, for spatial-temporal world modelling.
The FACTS framework constructs a graph-memory with a routing mechanism that learns permutable memory representations.
It consistently outperforms or matches specialised state-of-the-art models, despite its general-purpose world modelling design.
arXiv Detail & Related papers (2024-10-28T11:04:42Z) - Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient [9.519619751861333]
We propose a state space model (SSM) based world model based on Mamba.
It achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies.
This model is accessible and can be trained on an off-the-shelf laptop.
arXiv Detail & Related papers (2024-10-11T15:10:40Z) - PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model [7.286873011001679]
We propose a purely SSM-based approach with linear correlations for complexityD human pose estimation in monocular video video.
Specifically, we propose a bidirectional global temporal-local-temporal block that comprehensively models human joint relations within individual frames as well as across frames.
This strategy provides a more logical geometric ordering strategy, resulting in a combined-local spatial scan.
arXiv Detail & Related papers (2024-08-07T04:38:03Z) - Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models [106.94827590977337]
We propose a novel world model for Multi-Agent RL (MARL) that learns decentralized local dynamics for scalability.
We also introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation.
Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
arXiv Detail & Related papers (2024-06-22T12:40:03Z) - Mastering Memory Tasks with World Models [12.99255437732525]
Current model-based reinforcement learning (MBRL) agents struggle with long-term dependencies.
We present a new method, Recall to Imagine (R2I), to improve temporal coherence.
R2I establishes a new state-of-the-art for challenging memory and credit assignment RL tasks.
arXiv Detail & Related papers (2024-03-07T06:35:59Z) - Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling.
It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z) - Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE.
Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4.
We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z) - TransDreamer: Reinforcement Learning with Transformer World Models [33.34909288732319]
We propose a transformer-based Model-Based Reinforcement Learning agent, called TransDreamer.
We first introduce the Transformer State-Space Model, a world model that leverages a transformer for dynamics predictions. We then share this world model with a transformer-based policy network and obtain stability in training a transformer-based RL agent.
In experiments, we apply the proposed model to 2D visual RL and 3D first-person visual RL tasks both requiring long-range memory access for memory-based reasoning. We show that the proposed model outperforms Dreamer in these complex tasks.
arXiv Detail & Related papers (2022-02-19T00:30:52Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.