AIRE-Prune: Asymptotic Impulse-Response Energy for State Pruning in State Space Models
- URL: http://arxiv.org/abs/2602.00534v1
- Date: Sat, 31 Jan 2026 06:03:43 GMT
- Title: AIRE-Prune: Asymptotic Impulse-Response Energy for State Pruning in State Space Models
- Authors: Apurba Prasad Padhy, Fernando Camacho, Saibal Mukhopadhyay,
- Abstract summary: AIRE-Prune is a post-training pruning method for state space models (SSMs)<n>It reduces each layer's state dimension by directly minimizing long-run output-energy distortion.<n>Across diverse benchmarks, AIRE-Prune reveals substantial redundancy in SISO and SSMs with average pruning of 60.8%, with average accuracy drop of 0.29% without retraining, while significantly lowering compute.
- Score: 51.93574339176914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State space models (SSMs) often sacrifice capacity, search space, or stability to offset the memory and compute costs of large state dimensions. We introduce a structured post-training pruning method for SSMs -- AIRE-Prune (Asymptotic Impulse-Response Energy for State PRUN(E)) -- that reduces each layer's state dimension by directly minimizing long-run output-energy distortion. AIRE-Prune assigns every state a closed-form asymptotic impulse-response energy-based score, i.e., the total impulse-response energy it contributes over an infinite horizon (time), and normalizes these scores layer-wise to enable global cross-layer comparison and selection. This extends modal truncation from single systems to deep stacks and aligns pruning with asymptotic response energy rather than worst-case gain. Across diverse sequence benchmarks, AIRE-Prune reveals substantial redundancy in SISO and MIMO SSMs with average pruning of 60.8%, with average accuracy drop of 0.29% without retraining, while significantly lowering compute. Code: https://github.com/falcon-arrow/AIRE-Prune.
Related papers
- ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference [60.958331943869126]
ODAR-Expert is an adaptive routing framework that optimize the accuracy-efficiency trade-off via principled resource allocation.<n>We show strong and consistent gains, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam.
arXiv Detail & Related papers (2026-02-27T05:22:01Z) - Matterhorn: Efficient Analog Sparse Spiking Transformer Architecture with Masked Time-To-First-Spike Encoding [12.040413194036383]
Spiking neural networks (SNNs) have emerged as a promising candidate for energy-efficient LLM inference.<n>We propose Matterhorn, a spiking transformer that integrates a novel masked time-to-first-spike encoding method.<n> Matterhorn establishes a new state-of-the-art, surpassing existing SNNs by 1.42% in average accuracy while delivering a 2.31 times improvement in energy efficiency.
arXiv Detail & Related papers (2026-01-30T11:53:42Z) - A Unified Framework for EEG Seizure Detection Using Universum-Integrated Generalized Eigenvalues Proximal Support Vector Machine [5.725795684434675]
The paper presents novel Universum-enhanced classifiers for EEG signal classification.<n>The proposed models address critical challenges in EEG analysis: non-stationarity, low signal-to-noise ratio, and limited labeled data.<n>The models are evaluated on the Bonn University EEG dataset across two binary classification tasks.
arXiv Detail & Related papers (2025-12-24T13:39:11Z) - Kernel-Adaptive PI-ELMs for Forward and Inverse Problems in PDEs with Sharp Gradients [0.0]
This paper introduces the Kernel Adaptive Physics-Informed Extreme Learning Machine (KAPI-ELM)<n>It is designed to solve both forward and inverse Partial Differential Equation (PDE) problems involving localized sharp gradients.<n>KAPI-ELM achieves state-of-the-art accuracy in both forward and inverse settings.
arXiv Detail & Related papers (2025-07-14T13:03:53Z) - TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs [58.19080159470868]
We propose a novel low-rank ZO estimator, TeZO, which captures the low-rankness across both the model and temporal dimension.<n>Specifically, we represent ZO perturbations along the temporal dimension as a 3D tensor and employ Canonical Polyadic Decomposition (CPD) to extract each low-rank 2D matrix.
arXiv Detail & Related papers (2025-01-31T11:34:03Z) - Layer-Adaptive State Pruning for Deep State Space Models [1.5749416770494706]
We provide a structured pruning method for SSMs, Layer-Adaptive STate pruning (LAST)<n>Last scores are evaluated using the $mathcalH_infty$ norms of subsystems and layer-wise energy normalization.<n>We demonstrate that, on average, pruning 33% of states still maintains performance with 0.52% accuracy loss in multi-input multi-output SSMs without retraining.
arXiv Detail & Related papers (2024-11-05T05:50:51Z) - Entanglement Distribution Delay Optimization in Quantum Networks with Distillation [51.53291671169632]
Quantum networks (QNs) distribute entangled states to enable distributed quantum computing and sensing applications.
QS resource allocation framework is proposed to enhance the end-to-end (e2e) fidelity and satisfy minimum rate and fidelity requirements.
arXiv Detail & Related papers (2024-05-15T02:04:22Z) - Optimal Scaling for Locally Balanced Proposals in Discrete Spaces [65.14092237705476]
We show that efficiency of Metropolis-Hastings (M-H) algorithms in discrete spaces can be characterized by an acceptance rate that is independent of the target distribution.
Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces.
arXiv Detail & Related papers (2022-09-16T22:09:53Z) - Layer-adaptive sparsity for the Magnitude-based Pruning [88.37510230946478]
We propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score.
LAMP consistently outperforms popular existing schemes for layerwise sparsity selection.
arXiv Detail & Related papers (2020-10-15T09:14:02Z) - Targeted free energy estimation via learned mappings [66.20146549150475]
Free energy perturbation (FEP) was proposed by Zwanzig more than six decades ago as a method to estimate free energy differences.
FEP suffers from a severe limitation: the requirement of sufficient overlap between distributions.
One strategy to mitigate this problem, called Targeted Free Energy Perturbation, uses a high-dimensional mapping in configuration space to increase overlap.
arXiv Detail & Related papers (2020-02-12T11:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.