Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing
- URL: http://arxiv.org/abs/2602.19805v2
- Date: Thu, 26 Feb 2026 06:48:38 GMT
- Title: Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing
- Authors: Wall Kim, Chaeyoung Song, Hanul Kim,
- Abstract summary: Mamba-based models have drawn much attention in offline RL.<n>We propose a simple yet effective structure, called Decision MetaMamba (DMM)<n>By performing sequence mixing that considers all channels simultaneously before Mamba, DMM prevents information loss due to selective scanning and residual gating.
- Score: 3.5939555573102857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mamba-based models have drawn much attention in offline RL. However, their selective mechanism often detrimental when key steps in RL sequences are omitted. To address these issues, we propose a simple yet effective structure, called Decision MetaMamba (DMM), which replaces Mamba's token mixer with a dense layer-based sequence mixer and modifies positional structure to preserve local information. By performing sequence mixing that considers all channels simultaneously before Mamba, DMM prevents information loss due to selective scanning and residual gating. Extensive experiments demonstrate that our DMM delivers the state-of-the-art performance across diverse RL tasks. Furthermore, DMM achieves these results with a compact parameter footprint, demonstrating strong potential for real-world applications.
Related papers
- Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data [52.07689534063587]
State Space Models (SSMs) have emerged as promising alternatives to attention mechanisms.<n>In this work, we use carefully designed synthetic tasks to reveal Mamba's inherent limitations.
arXiv Detail & Related papers (2025-09-22T08:38:55Z) - HiFi-Mamba: Dual-Stream W-Laplacian Enhanced Mamba for High-Fidelity MRI Reconstruction [5.899756063964437]
High-Fidelity Mamba (HiFi-Mamba) is a novel dual-stream Mamba-based architecture for MRI reconstruction.<n>HiFi-Mamba consistently outperforms state-of-the-art CNN-based, Transformer-based, and other Mamba-based models in reconstruction accuracy.
arXiv Detail & Related papers (2025-08-07T10:08:18Z) - Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection [88.47928738482719]
Linear State Space Models (SSMs) offer remarkable performance gains in sequence modeling.<n>Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations.<n>We introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
arXiv Detail & Related papers (2025-06-22T19:26:55Z) - Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining [75.14823970163685]
We propose a novel preference learning framework, Modality-Balancing Preference Optimization (MBPO), to address the modality imbalance in LMMs.<n>MBPO constructs a more effective offline preference dataset by generating hard negatives, i.e., rejected responses misled by LLM biases.<n>It can enhance LMM performance on challenging vision-language tasks and effectively reduce hallucinations.
arXiv Detail & Related papers (2025-05-20T03:59:05Z) - Sparse Deformable Mamba for Hyperspectral Image Classification [1.3471768511567523]
Mamba models significantly improve hyperspectral image (HSI) classification.<n>One critical challenge is the difficulty in building the sequence of Mamba tokens efficiently.<n>This paper presents a Sparse Deformable Mamba (SDMamba) approach for enhanced HSI classification.
arXiv Detail & Related papers (2025-04-13T06:08:19Z) - Mamba-CL: Optimizing Selective State Space Model in Null Space for Continual Learning [54.19222454702032]
Continual Learning aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge.<n>State Space Models (SSMs) have achieved notable success in computer vision.<n>We introduce Mamba-CL, a framework that continuously fine-tunes the core SSMs of the large-scale Mamba foundation model.
arXiv Detail & Related papers (2024-11-23T06:36:16Z) - Integrating Multi-Modal Input Token Mixer Into Mamba-Based Decision Models: Decision MetaMamba [0.0]
Sequence modeling with State Space models (SSMs) has demonstrated performance surpassing that of Transformers in various tasks.<n>However, decision models based on Mamba, a state-of-the-art SSM, failed to achieve superior performance compared to enhanced Decision Transformers.<n>We propose the Decision MetaMamba (DMM), which augments Mamba with a token mixer in its input layer.
arXiv Detail & Related papers (2024-08-20T03:35:28Z) - Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL [57.202733701029594]
We propose Decision Mamba, a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy.<n>To address these challenges, we propose Decision Mamba, a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy.<n>To mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization.
arXiv Detail & Related papers (2024-06-08T10:12:00Z) - Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning [16.23977055134524]
We propose a novel action predictor sequence, named Mamba Decision Maker (MambaDM)
MambaDM is expected to be a promising alternative for sequence modeling paradigms, owing to its efficient modeling of multi-scale dependencies.
This paper delves into the sequence modeling capabilities of MambaDM in the RL domain, paving the way for future advancements.
arXiv Detail & Related papers (2024-06-04T06:49:18Z) - Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling [13.253878928833688]
We propose a Decision Mamba-Hybrid (DM-H) for in-context reinforcement learning.
DM-H generates high-value sub-goals from long-term memory through the Mamba model.
Online testing of DM-H in the long-term task is 28$times$ times faster than the transformer-based baselines.
arXiv Detail & Related papers (2024-05-31T10:41:03Z) - Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning [66.18202188565922]
We propose a communication-efficient decentralized machine learning (ML) algorithm, coined QGADMM (QGADMM)<n>We develop a novel quantization method to adaptively adjust modelization levels and their probabilities, while proving the convergence of QGADMM for convex functions.
arXiv Detail & Related papers (2019-10-23T10:47:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.