Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State Spaces
- URL: http://arxiv.org/abs/2403.19925v1
- Date: Fri, 29 Mar 2024 02:25:55 GMT
- Title: Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State Spaces
- Authors: Toshihiro Ota,
- Abstract summary: Mamba is known for its advanced capabilities in efficient and effective sequence modeling.
This paper investigates the integration of the Mamba framework, known for its advanced capabilities in efficient and effective sequence modeling, into the Decision Transformer architecture.
- Score: 0.32634122554914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decision Transformer, a promising approach that applies Transformer architectures to reinforcement learning, relies on causal self-attention to model sequences of states, actions, and rewards. While this method has shown competitive results, this paper investigates the integration of the Mamba framework, known for its advanced capabilities in efficient and effective sequence modeling, into the Decision Transformer architecture, focusing on the potential performance enhancements in sequential decision-making tasks. Our study systematically evaluates this integration by conducting a series of experiments across various decision-making environments, comparing the modified Decision Transformer, Decision Mamba, with its traditional counterpart. This work contributes to the advancement of sequential decision-making models, suggesting that the architecture and training methodology of neural networks can significantly impact their performance in complex tasks, and highlighting the potential of Mamba as a valuable tool for improving the efficacy of Transformer-based models in reinforcement learning scenarios.
Related papers
- DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting [37.334947053450996]
We introduce a novel approach that combines the Dreamer algorithm's ability to generate anticipatory trajectories with the adaptive strengths of the Online Decision Transformer.
Our methodology enables parallel training where Dreamer-produced trajectories enhance the contextual decision-making of the transformer.
arXiv Detail & Related papers (2024-10-15T07:27:56Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks [50.75902473813379]
This work introduces a comprehensive evaluation framework that systematically examines the role of instructions and inputs in the generalisation abilities of such models.
The proposed framework uncovers the resilience of multimodal models to extreme instruction perturbations and their vulnerability to observational changes.
arXiv Detail & Related papers (2024-07-04T14:36:49Z) - Decision Mamba Architectures [1.4255659581428335]
Decision Mamba architecture has shown to outperform Transformers across various task domains.
We introduce two novel methods, Decision Mamba (DM) and Hierarchical Decision Mamba (HDM)
We demonstrate the superiority of Mamba models over their Transformer counterparts in a majority of tasks.
arXiv Detail & Related papers (2024-05-13T17:18:08Z) - Task adaption by biologically inspired stochastic comodulation [8.59194778459436]
We show that fine-tuning convolutional networks by gain modulation improves on deterministic gain modulation.
Our results suggest that comodulation representations can enhance learning efficiency and performance in multi-task learning.
arXiv Detail & Related papers (2023-11-25T15:21:03Z) - Decision Stacks: Flexible Reinforcement Learning via Modular Generative
Models [37.79386205079626]
Decision Stacks is a generative framework that decomposes goal-conditioned policy agents into 3 generative modules.
These modules simulate the temporal evolution of observations, rewards, and actions via independent generative models that can be learned in parallel via teacher forcing.
Our framework guarantees both expressivity and flexibility in designing individual modules to account for key factors such as architectural bias, optimization objective and dynamics, transferrability across domains, and inference speed.
arXiv Detail & Related papers (2023-06-09T20:52:16Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z) - BayesFormer: Transformer with Uncertainty Estimation [31.206243748162553]
We introduce BayesFormer, a Transformer model with dropouts designed by Bayesian theory.
We show improvements across the board: language modeling and classification, long-sequence understanding, machine translation and acquisition function for active learning.
arXiv Detail & Related papers (2022-06-02T01:54:58Z) - Emergent Hand Morphology and Control from Optimizing Robust Grasps of
Diverse Objects [63.89096733478149]
We introduce a data-driven approach where effective hand designs naturally emerge for the purpose of grasping diverse objects.
We develop a novel Bayesian Optimization algorithm that efficiently co-designs the morphology and grasping skills jointly.
We demonstrate the effectiveness of our approach in discovering robust and cost-efficient hand morphologies for grasping novel objects.
arXiv Detail & Related papers (2020-12-22T17:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.