SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot
- URL: http://arxiv.org/abs/2506.09613v1
- Date: Wed, 11 Jun 2025 11:14:57 GMT
- Title: SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot
- Authors: Kaiwen Tuo, Huan Wang,
- Abstract summary: State-space language models such as Mamba match Transformer quality while permitting linear complexity inference.<n>Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix.<n>We introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures.
- Score: 8.080568103779893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-space language models such as Mamba match Transformer quality while permitting linear complexity inference, yet still comprise billions of parameters that hinder deployment. Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix at the heart of the selective state-space module (SSM). In this paper, we introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures. Our layer-wise algorithm (i) derives an approximate second-order saliency score that aggregates Hessian-trace information across time steps, (ii) incorporates a component sensitivity analysis to guide feed-forward network (FFN) pruning, which also sheds light on where redundancy resides in mamba architecture, (iii) can be easily extended to semi-structured and structured sparsity. Empirically, we prune 50% of SSM weights without fine-tuning and observe no zero-shot accuracy loss, achieving the current state-of-the-art pruning algorithm for Mamba-based LLMs.
Related papers
- StruMamba3D: Exploring Structural Mamba for Self-supervised Point Cloud Representation Learning [31.585380521480868]
We propose StruMamba3D, a novel paradigm for self-supervised point cloud representation learning.<n>We design spatial states and use them as proxies to preserve spatial dependencies among points.<n>Our method attains the SOTA 95.1% accuracy on ModelNet40 and 92.75% accuracy on the most challenging split of ScanObjectNN without voting strategy.
arXiv Detail & Related papers (2025-06-26T17:58:05Z) - Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z) - Free Parametrization of L2-bounded State Space Models [0.0]
We introduce L2RU, a novel parametrization of structured state-space models (SSMs) that guarantees input-output stability and robustness.<n>We derive a non-conservative parametrization of square discrete-time LTI systems with a specified L2-bound, forming the foundation of the L2RU architecture.
arXiv Detail & Related papers (2025-03-31T07:56:17Z) - V"Mean"ba: Visual State Space Models only need 1 hidden dimension [0.7864304771129751]
State Space Models (SSMs) have emerged as a solution by introducing a linear recurrence mechanism.<n>We introduce textitVMeanba, a training-free compression method that eliminates the channel dimension in SSMs using mean operations.<n>We show that textitVMeanba achieves up to a 1.12x speedup with less than a 3% accuracy loss.
arXiv Detail & Related papers (2024-12-21T12:27:07Z) - Layer-Adaptive State Pruning for Deep State Space Models [1.5749416770494706]
We provide a structured pruning method for SSMs, Layer-Adaptive STate pruning (LAST)<n>Last scores are evaluated using the $mathcalH_infty$ norms of subsystems and layer-wise energy normalization.<n>We demonstrate that, on average, pruning 33% of states still maintains performance with 0.52% accuracy loss in multi-input multi-output SSMs without retraining.
arXiv Detail & Related papers (2024-11-05T05:50:51Z) - Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion [46.82975707531064]
Selective state space models (SSMs) excel at capturing long-range dependencies in 1D sequential data.<n>We propose Spatial-Mamba, a novel approach that establishes neighborhood connectivity directly in the state space.<n>We show that Spatial-Mamba, even with a single scan, attains or surpasses the state-of-the-art SSM-based models in image classification, detection and segmentation.
arXiv Detail & Related papers (2024-10-19T12:56:58Z) - DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation [66.8732965660931]
paper introduces a 3D point cloud sequence learning model based on inconsistent-temporal propagation for LiDAR odometry DSLO.
It consists of a pyramid structure with a sequential pose module, a hierarchical pose refinement module, and a temporal feature propagation module.
arXiv Detail & Related papers (2024-09-01T15:12:48Z) - SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.<n>We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.<n>Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding.<n>Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z) - Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models.
We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.
Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Automated classification of pre-defined movement patterns: A comparison
between GNSS and UWB technology [55.41644538483948]
Real-time location systems (RTLS) allow for collecting data from human movement patterns.
The current study aims to design and evaluate an automated framework to classify human movement patterns in small areas.
arXiv Detail & Related papers (2023-03-10T14:46:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.