Related papers: Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?

Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?

URL: http://arxiv.org/abs/2405.12094v1
Date: Mon, 20 May 2024 15:05:47 GMT
Title: Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?
Authors: Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Shengchao Hu, Mengzhu Wang, Shouling Ji, Jincai Huang, Li Shen,
Abstract summary: Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL) This work aims to conduct comprehensive experiments to explore the potential of Decision Mamba in offline RL (dubbed DeMa) Our specially designed DeMa is compatible with trajectory optimization and surpasses previous state-of-the-art methods.
Score: 32.33214392196923
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL), yet it poses challenges due to substantial parameter size and limited scalability, which is particularly critical in sequential decision-making scenarios where resources are constrained such as in robots and drones with limited computational power. Mamba, a promising new linear-time sequence model, offers performance on par with transformers while delivering substantially fewer parameters on long sequences. As it remains unclear whether Mamba is compatible with trajectory optimization, this work aims to conduct comprehensive experiments to explore the potential of Decision Mamba in offline RL (dubbed DeMa) from the aspect of data structures and network architectures with the following insights: (1) Long sequences impose a significant computational burden without contributing to performance improvements due to the fact that DeMa's focus on sequences diminishes approximately exponentially. Consequently, we introduce a Transformer-like DeMa as opposed to an RNN-like DeMa. (2) For the components of DeMa, we identify that the hidden attention mechanism is key to its success, which can also work well with other residual structures and does not require position embedding. Extensive evaluations from eight Atari games demonstrate that our specially designed DeMa is compatible with trajectory optimization and surpasses previous state-of-the-art methods, outdoing Decision Transformer (DT) by 80\% with 30\% fewer parameters, and exceeds DT in MuJoCo with only a quarter of the parameters.

Related papers

MoKA: Mixture of Kronecker Adapters [10.972403518731639]
Low-rank family adapters are commonly used to control the parameter size efficiently while maintaining the generative power of large language models.<n>We propose a new generation of Kronecker adapters that addresses this limitation by modeling weight updates as a mixture of Kronecker products.<n>We conduct extensive experiments on instruction-tuning and commonsense reasoning tasks using low-bit quantized versions of LLaMA2-7B and LLaMA3-8B models.
arXiv Detail & Related papers (2025-08-05T14:58:14Z)
Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection [88.47928738482719]
Linear State Space Models (SSMs) offer remarkable performance gains in sequence modeling.<n>Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations.<n>We introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
arXiv Detail & Related papers (2025-06-22T19:26:55Z)
The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks [56.37880529653111]
The demand for large computation model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications.<n>In this paper, we investigate the LAIM-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment.
arXiv Detail & Related papers (2025-05-14T08:18:55Z)
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments [2.1797343876622097]
State-space models (SSMs) have emerged as powerful alternatives to Transformers for sequence modeling.<n>We propose a novel unstructured pruning framework tailored for Mamba models that achieves up to 70% parameter reduction while retaining over 95% of the original performance.
arXiv Detail & Related papers (2025-05-13T07:23:08Z)
On the locality bias and results in the Long Range Arena [49.15148871877941]
The Long Range Arena benchmark was designed to evaluate the performance of Transformer improvements. A new series of architectures such as State Space Models (SSMs) gained some traction, greatly outperforming Transformers in the LRA. We show that while the LRA is a benchmark for long-range dependency modeling, in reality most of the performance comes from short-range dependencies.
arXiv Detail & Related papers (2025-01-24T15:34:50Z)
OP-LoRA: The Blessing of Dimensionality [93.08208871549557]
Low-rank adapters enable fine-tuning of large models with only a small number of parameters. They often pose optimization challenges, with poor convergence. We introduce an over- parameterized approach that accelerates training without increasing inference costs. We achieve improvements in vision-language tasks and especially notable increases in image generation.
arXiv Detail & Related papers (2024-12-13T18:55:19Z)
Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction. We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation. Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
Integrating Multi-Modal Input Token Mixer Into Mamba-Based Decision Models: Decision MetaMamba [0.0]
Sequence modeling with State Space models (SSMs) has demonstrated performance surpassing that of Transformers in various tasks. However, decision models based on Mamba, a state-of-the-art SSM, failed to achieve superior performance compared to enhanced Decision Transformers. We propose the Decision MetaMamba (DMM), which augments Mamba with a token mixer in its input layer.
arXiv Detail & Related papers (2024-08-20T03:35:28Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning [16.23977055134524]
We propose a novel action predictor sequence, named Mamba Decision Maker (MambaDM) MambaDM is expected to be a promising alternative for sequence modeling paradigms, owing to its efficient modeling of multi-scale dependencies. This paper delves into the sequence modeling capabilities of MambaDM in the RL domain, paving the way for future advancements.
arXiv Detail & Related papers (2024-06-04T06:49:18Z)
Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling [13.253878928833688]
We propose a Decision Mamba-Hybrid (DM-H) for in-context reinforcement learning. DM-H generates high-value sub-goals from long-term memory through the Mamba model. Online testing of DM-H in the long-term task is 28$times$ times faster than the transformer-based baselines.
arXiv Detail & Related papers (2024-05-31T10:41:03Z)
Bitformer: An efficient Transformer with bitwise operation-based attention for Big Data Analytics at low-cost low-precision devices [2.484958184370265]
We introduce the Bitformer model, a novel attention mechanism that adeptly replaces conventional floating-point matrix multiplication with bitwise operations. The transition from an $O(n2d)$ complexity, typical of floating-point operations, to an $O(n2T)$ complexity characterizing bitwise operations, substantiates this advantage.
arXiv Detail & Related papers (2023-11-22T16:20:24Z)
Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback. We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Parameter-efficient Tuning of Large-scale Multimodal Foundation Model [68.24510810095802]
We propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges. Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning. A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach.
arXiv Detail & Related papers (2023-05-15T06:40:56Z)
Meta-Learning Adversarial Bandits [49.094361442409785]
We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure. As the first to target the adversarial setting, we design a meta-algorithm that setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit optimization (BLO) Our guarantees rely on proving that unregularized follow-the-leader combined with multiplicative weights is enough to online learn a non-smooth and non-B sequence.
arXiv Detail & Related papers (2022-05-27T17:40:32Z)
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM [16.464030458567187]
We introduce a new learning framework, called NxMTransformer, to induce NxM semi-structured sparsity on pretrained language models. We propose to formulate the NxM sparsity as a constrained optimization problem and use Alternating Direction Method of Multipliers (ADMM) to optimize the downstream tasks. Our proposed method is able to achieve 1.7 points higher accuracy in GLUE score than current practices.
arXiv Detail & Related papers (2021-10-28T17:43:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.