Related papers: EfficientState Space Model viaFast Tensor Convolutionand Block Diagonalization

EfficientState Space Model viaFast Tensor Convolutionand Block Diagonalization

URL: http://arxiv.org/abs/2402.15290v3
Date: Sun, 06 Oct 2024 15:14:53 GMT
Title: EfficientState Space Model viaFast Tensor Convolutionand Block Diagonalization
Authors: Tongyi Liang, Han-Xiong Li,
Abstract summary: We propose a new state space layer based on multiple-input multiple-output SSM, called efficient SSM. Our eSSM is built on the convolutional representation of multi-input and multi-input (MIMO) SSM. In the model efficiency benchmark, the parameters of eSSM are only 12.89% of LSTM and 13.24% of Mamba.
Score: 5.260841516691153
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing models encounter bottlenecks in balancing performance and computational efficiency when modeling long sequences. Although the state space model (SSM) has achieved remarkable success in handling long sequence tasks, it still faces the problem of large number of parameters. In order to further improve the efficiency of SSM, we propose a new state space layer based on multiple-input multiple-output SSM, called efficient SSM (eSSM). Our eSSM is built on the convolutional representation of multi-input and multi-input (MIMO) SSM. We propose a variety of effective strategies to improve the computational efficiency. The diagonalization of the system matrix first decouples the original system. Then a fast tensor convolution is proposed based on the fast Fourier transform. In addition, the block diagonalization of the SSM further reduces the model parameters and improves the model flexibility. Extensive experimental results show that the performance of the proposed model on multiple databases matches the performance of state-of-the-art models, such as S4, and is significantly better than Transformers and LSTM. In the model efficiency benchmark, the parameters of eSSM are only 12.89\% of LSTM and 13.24\% of Mamba. The training speed of eSSM is 3.94 times faster than LSTM and 1.35 times faster than Mamba. Code is available at: \href{https://github.com/leonty1/essm}{https://github.com/leonty1/essm}.

Related papers

Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models [11.472831744634156]
State Space Models (SSMs) are emerging as a compelling alternative to Transformers because of their consistent memory usage and high performance. To overcome this, quantizing SSMs with low bit-width data formats can reduce model size and benefit from hardware acceleration. We present Quamba2, compatible with W8A8, W4A8, and W4A16 for both Mamba1 and Mamba2 backbones.
arXiv Detail & Related papers (2025-03-28T21:10:39Z)
Sliding Window Attention Training for Efficient Large Language Models [55.56483740523027]
We introduce SWAT, which enables efficient long-context handling via Sliding Window Attention Training. This paper first attributes the inefficiency of Transformers to the attention sink phenomenon resulting from the high variance of softmax operation. Experiments demonstrate that SWAT achieves SOTA performance compared with state-of-the-art linear recurrent architectures on eight benchmarks.
arXiv Detail & Related papers (2025-02-26T05:31:44Z)
Parameter-Efficient Fine-Tuning of State Space Models [10.817729275974829]
Deep State Space Models (SSMs) have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. This paper investigates the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models. We propose Sparse Dimension Tuning (SDT), a PEFT method tailored for SSM modules.
arXiv Detail & Related papers (2024-10-11T17:30:28Z)
GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model [66.35608254724566]
State-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity. However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks. Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
arXiv Detail & Related papers (2024-07-18T17:59:58Z)
HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences. We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators. Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z)
Scaling Sparse Fine-Tuning to Large Language Models [67.59697720719672]
Large Language Models (LLMs) are difficult to fully fine-tune due to their sheer number of parameters. We propose SpIEL, a novel sparse finetuning method which maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values. We show that SpIEL is superior to popular parameter-efficient fine-tuning methods like LoRA in terms of performance and comparable in terms of run time.
arXiv Detail & Related papers (2024-01-29T18:43:49Z)
A-SDM: Accelerating Stable Diffusion through Redundancy Removal and Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network. We then prune the redundancy blocks of the model and maintain the network performance. Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z)
Mamba: Linear-Time Sequence Modeling with Selective State Spaces [31.985243136674146]
Foundation models are almost universally based on the Transformer architecture and its core attention module. We identify that a key weakness of such models is their inability to perform content-based reasoning. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even blocks (Mamba) As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics.
arXiv Detail & Related papers (2023-12-01T18:01:34Z)
Accelerating Toeplitz Neural Network with Constant-time Inference Complexity [21.88774274472737]
Toeplitz Neural Networks (TNNs) have exhibited outstanding performance in various sequence modeling tasks. They outperform commonly used Transformer-based models while benefiting from log-linear space-time complexities. In this paper, we aim to combine the strengths of TNNs and State Space Models (SSMs) by converting TNNs to SSMs during inference.
arXiv Detail & Related papers (2023-11-15T07:50:57Z)
Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling. It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z)
Parameter-efficient Tuning of Large-scale Multimodal Foundation Model [68.24510810095802]
We propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges. Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning. A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach.
arXiv Detail & Related papers (2023-05-15T06:40:56Z)
Compressing LSTM Networks by Matrix Product Operators [7.395226141345625]
Long Short Term Memory(LSTM) models are the building blocks of many state-of-the-art natural language processing(NLP) and speech enhancement(SE) algorithms. Here we introduce the MPO decomposition, which describes the local correlation of quantum states in quantum many-body physics. We propose a matrix product operator(MPO) based neural network architecture to replace the LSTM model.
arXiv Detail & Related papers (2020-12-22T11:50:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.