Rethinking Token Reduction for State Space Models
- URL: http://arxiv.org/abs/2410.14725v1
- Date: Wed, 16 Oct 2024 00:06:13 GMT
- Title: Rethinking Token Reduction for State Space Models
- Authors: Zheng Zhan, Yushu Wu, Zhenglun Kong, Changdi Yang, Yifan Gong, Xuan Shen, Xue Lin, Pu Zhao, Yanzhi Wang,
- Abstract summary: We propose a tailored, unified post-training token reduction method for State Space Models (SSMs)
Our approach integrates token importance and similarity, thus taking advantage of both pruning and merging.
Our method improves the average accuracy by 5.7% to 13.1% on six benchmarks with Mamba-2 compared to existing methods.
- Score: 47.00760373683448
- License:
- Abstract: Recent advancements in State Space Models (SSMs) have attracted significant interest, particularly in models optimized for parallel training and handling long-range dependencies. Architectures like Mamba have scaled to billions of parameters with selective SSM. To facilitate broader applications using Mamba, exploring its efficiency is crucial. While token reduction techniques offer a straightforward post-training strategy, we find that applying existing methods directly to SSMs leads to substantial performance drops. Through insightful analysis, we identify the reasons for this failure and the limitations of current techniques. In response, we propose a tailored, unified post-training token reduction method for SSMs. Our approach integrates token importance and similarity, thus taking advantage of both pruning and merging, to devise a fine-grained intra-layer token reduction strategy. Extensive experiments show that our method improves the average accuracy by 5.7% to 13.1% on six benchmarks with Mamba-2 compared to existing methods, while significantly reducing computational demands and memory requirements.
Related papers
- Mamba-CL: Optimizing Selective State Space Model in Null Space for Continual Learning [54.19222454702032]
Continual Learning aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge.
State Space Models (SSMs) have achieved notable success in computer vision.
We introduce Mamba-CL, a framework that continuously fine-tunes the core SSMs of the large-scale Mamba foundation model.
arXiv Detail & Related papers (2024-11-23T06:36:16Z) - SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs [2.7624021966289605]
Large Language Models (LLMs) have revolutionized natural language understanding and generation tasks.
LLMs suffer from high memory consumption and slow inference times due to their large parameter sizes.
This paper introduces SLiM, a novel approach for compressing LLMs using a one-shot Quantized Sparse Plus Low-rank Approximation.
arXiv Detail & Related papers (2024-10-12T18:36:07Z) - Exploring Token Pruning in Vision State Space Models [38.122017567843905]
State Space Models (SSMs) have the advantage of keeping linear computational complexity compared to attention modules in transformers.
We take the novel step of enhancing the efficiency of SSM-based vision models through token-based pruning.
We achieve 81.7% accuracy on ImageNet with a 41.6% reduction in the FLOPs for pruned PlainMamba-L3.
arXiv Detail & Related papers (2024-09-27T17:59:50Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty [18.611360495409087]
Probabilistic State Space Models (SSMs) are essential for Reinforcement Learning (RL) from high-dimensional, partial information as they provide concise representations for control.
We propose KalMamba, an efficient architecture to learn representations for RL that combines the strengths of probabilistic SSMs with the scalability of deterministic SSMs.
arXiv Detail & Related papers (2024-06-21T13:27:36Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts [49.01990048827639]
We introduce SEER-MoE, a framework for reducing both the memory footprint and compute requirements of pre-trained MoE models.
The first stage involves pruning the total number of experts using a heavy-hitters counting guidance, while the second stage employs a regularization-based fine-tuning strategy to recover accuracy loss.
Our empirical studies demonstrate the effectiveness of our method, resulting in a sparse MoEs model optimized for inference efficiency with minimal accuracy trade-offs.
arXiv Detail & Related papers (2024-04-07T22:13:43Z) - Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference.
We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.