Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient
- URL: http://arxiv.org/abs/2410.08893v1
- Date: Fri, 11 Oct 2024 15:10:40 GMT
- Title: Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient
- Authors: Wenlong Wang, Ivana Dusparic, Yucheng Shi, Ke Zhang, Vinny Cahill,
- Abstract summary: We propose a state space model (SSM) based world model based on Mamba.
It achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies.
This model is accessible and can be trained on an off-the-shelf laptop.
- Score: 9.519619751861333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning (RL) offers a solution to the data inefficiency that plagues most model-free RL algorithms. However, learning a robust world model often demands complex and deep architectures, which are expensive to compute and train. Within the world model, dynamics models are particularly crucial for accurate predictions, and various dynamics-model architectures have been explored, each with its own set of challenges. Currently, recurrent neural network (RNN) based world models face issues such as vanishing gradients and difficulty in capturing long-term dependencies effectively. In contrast, use of transformers suffers from the well-known issues of self-attention mechanisms, where both memory and computational complexity scale as $O(n^2)$, with $n$ representing the sequence length. To address these challenges we propose a state space model (SSM) based world model, specifically based on Mamba, that achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies and facilitating the use of longer training sequences efficiently. We also introduce a novel sampling method to mitigate the suboptimality caused by an incorrect world model in the early stages of training, combining it with the aforementioned technique to achieve a normalised score comparable to other state-of-the-art model-based RL algorithms using only a 7 million trainable parameter world model. This model is accessible and can be trained on an off-the-shelf laptop. Our code is available at https://github.com/realwenlongwang/drama.git.
Related papers
- Accelerating Model-Based Reinforcement Learning with State-Space World Models [18.71404724458449]
Reinforcement learning (RL) is a powerful approach for robot learning.
However, model-free RL (MFRL) requires a large number of environment interactions to learn successful control policies.
We propose a new method for accelerating model-based RL using state-space world models.
arXiv Detail & Related papers (2025-02-27T15:05:25Z) - LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.
Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.
We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - Learning to Walk from Three Minutes of Real-World Data with Semi-structured Dynamics Models [9.318262213262866]
We introduce a novel framework for learning semi-structured dynamics models for contact-rich systems.
We make accurate long-horizon predictions with substantially less data than prior methods.
We validate our approach on a real-world Unitree Go1 quadruped robot.
arXiv Detail & Related papers (2024-10-11T18:11:21Z) - Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining [49.730897226510095]
We introduce JOWA: Jointly-Reinforced World-Action model, an offline model-based RL agent pretrained on Atari games with 6 billion tokens data.
Our largest agent, with 150 million parameters, 78.9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming existing state-of-the-art large-scale offline RL baselines by 31.6% on averange.
arXiv Detail & Related papers (2024-10-01T10:25:03Z) - Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators [78.64101336150419]
Predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling.
An alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a temporalittext model.
We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation.
arXiv Detail & Related papers (2024-08-09T17:05:45Z) - Efficient World Models with Context-Aware Tokenization [22.84676306124071]
$Delta$-IRIS is a new agent with a world model architecture composed of a discrete autoencoder that encodes deltas between time steps.
In the Crafter benchmark, $Delta$-IRIS sets a new state of the art at multiple frame budgets, while being an order of magnitude faster to train than previous attention-based approaches.
arXiv Detail & Related papers (2024-06-27T16:54:12Z) - Locality Sensitive Sparse Encoding for Learning World Models Online [29.124825481348285]
Follow-The-Leader world models are desirable for model-based reinforcement learning.
FTL models need re-training on all accumulated data at every interaction step to achieve FTL.
We show that our world models learned online using a single pass of trajectory data either surpass or match the performance of deep world models trained with replay.
arXiv Detail & Related papers (2024-01-23T19:00:02Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - Multi-timestep models for Model-based Reinforcement Learning [10.940666275830052]
In model-based reinforcement learning (MBRL), most algorithms rely on simulating trajectories from one-step dynamics models learned on data.
We tackle this issue by using a multi-timestep objective to train one-step models.
We find that exponentially decaying weights lead to models that significantly improve the long-horizon R2 score.
arXiv Detail & Related papers (2023-10-09T12:42:39Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Transformers are Sample Efficient World Models [1.9444242128493845]
We introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer.
With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games.
arXiv Detail & Related papers (2022-09-01T17:03:07Z) - Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models.
We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z) - Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation.
A core challenge is to generalize the manipulation skills to objects in different locations.
We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.