Related papers: There is HOPE to Avoid HiPPOs for Long-memory State Space Models

There is HOPE to Avoid HiPPOs for Long-memory State Space Models

URL: http://arxiv.org/abs/2405.13975v1
Date: Wed, 22 May 2024 20:20:14 GMT
Title: There is HOPE to Avoid HiPPOs for Long-memory State Space Models
Authors: Annan Yu, Michael W. Mahoney, N. Benjamin Erichson,
Abstract summary: State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences. We develop a new parameterization scheme, called HOPE, for LTI systems that utilizes parameters within Hankel operators. Our model efficiently implements these innovations by nonuniformly sampling the transfer functions of LTI systems.
Score: 51.66430224089725
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences. However, these models typically face several challenges: (i) they require specifically designed initializations of the system matrices to achieve state-of-the-art performance, (ii) they require training of state matrices on a logarithmic scale with very small learning rates to prevent instabilities, and (iii) they require the model to have exponentially decaying memory in order to ensure an asymptotically stable LTI system. To address these issues, we view SSMs through the lens of Hankel operator theory, which provides us with a unified theory for the initialization and training of SSMs. Building on this theory, we develop a new parameterization scheme, called HOPE, for LTI systems that utilizes Markov parameters within Hankel operators. This approach allows for random initializations of the LTI systems and helps to improve training stability, while also provides the SSMs with non-decaying memory capabilities. Our model efficiently implements these innovations by nonuniformly sampling the transfer functions of LTI systems, and it requires fewer parameters compared to canonical SSMs. When benchmarked against HiPPO-initialized models such as S4 and S4D, an SSM parameterized by Hankel operators demonstrates improved performance on Long-Range Arena (LRA) tasks. Moreover, we use a sequential CIFAR-10 task with padded noise to empirically corroborate our SSM's long memory capacity.

Related papers

Free Parametrization of L2-bounded State Space Models [0.0]
We introduce L2RU, a novel parametrization of structured state-space models (SSMs) that guarantees input-output stability and robustness. We derive a non-conservative parametrization of square discrete-time LTI systems with a specified L2-bound, forming the foundation of the L2RU architecture.
arXiv Detail & Related papers (2025-03-31T07:56:17Z)
Provable Benefits of Complex Parameterizations for Structured State Space Models [51.90574950170374]
Structured state space models (SSMs) are linear dynamical systems adhering to a specified structure. In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations. This paper takes a step towards explaining the benefits of complex parameterizations for SSMs by establishing formal gaps between real and complex diagonal SSMs.
arXiv Detail & Related papers (2024-10-17T22:35:50Z)
Parameter-Efficient Fine-Tuning of State Space Models [10.817729275974829]
Deep State Space Models (SSMs) have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. This paper investigates the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models. We propose Sparse Dimension Tuning (SDT), a PEFT method tailored for SSM modules.
arXiv Detail & Related papers (2024-10-11T17:30:28Z)
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models. We propose a novel model fine-tuning method to make full use of these ineffective parameters. Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z)
Semantic Codebook Learning for Dynamic Recommendation Models [55.98259490159084]
Dynamic sequential recommendation (DSR) can generate model parameters based on user behavior to improve personalization of sequential recommendation. It faces the challenges of large parameter search space and sparse and noisy user-item interactions, which reduces the applicability of the generated model parameters. The Semantic Codebook Learning for Dynamic Recommendation Models (SOLID) framework presents a significant advancement in DSR by effectively tackling these challenges.
arXiv Detail & Related papers (2024-07-31T19:25:25Z)
SMR: State Memory Replay for Long Sequence Modeling [19.755738298836526]
This paper proposes a novel non-recursive non-uniform sample processing strategy to overcome compatibility limitations in parallel convolutional computation. We introduce State Memory Replay (SMR), which utilizes learnable memories to adjust the current state with multi-step information for generalization at sampling points different from those in the training data. Experiments on long-range modeling tasks in autoregressive language modeling and Long Range Arena demonstrate the general effectiveness of the SMR mechanism for a series of SSM models.
arXiv Detail & Related papers (2024-05-27T17:53:32Z)
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models [73.48675708831328]
We propose a novel parameter and computation efficient tuning method for Multi-modal Large Language Models (MLLMs) The Efficient Attention Skipping (EAS) method evaluates the attention redundancy and skips the less important MHAs to speed up inference. The experiments show that EAS not only retains high performance and parameter efficiency, but also greatly speeds up inference speed.
arXiv Detail & Related papers (2024-03-22T14:20:34Z)
EfficientState Space Model viaFast Tensor Convolutionand Block Diagonalization [5.260841516691153]
We propose a new state space layer based on multiple-input multiple-output SSM, called efficient SSM. Our eSSM is built on the convolutional representation of multi-input and multi-input (MIMO) SSM. In the model efficiency benchmark, the parameters of eSSM are only 12.89% of LSTM and 13.24% of Mamba.
arXiv Detail & Related papers (2024-02-23T12:36:31Z)
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z)
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization [12.707050104493218]
We prove that state-space models without any re parameterization exhibit a memory limitation similar to that of traditional RNNs. Our analysis identifies this "curse of memory" as a result of the recurrent weights converging to a stability boundary.
arXiv Detail & Related papers (2023-11-24T14:08:31Z)
Switching Autoregressive Low-rank Tensor Models [12.461139675114818]
We show how to switch autoregressive low-rank tensor (SALT) models. SALT parameterizes the tensor of an ARHMM with a low-rank factorization to control the number of parameters. We prove theoretical and discuss practical connections between SALT, linear dynamical systems, and SLDSs.
arXiv Detail & Related papers (2023-06-05T22:25:28Z)
Towards Energy-Efficient, Low-Latency and Accurate Spiking LSTMs [1.7969777786551424]
Spiking Neural Networks (SNNs) have emerged as an attractive-temporal computing paradigm vision for complex tasks. We propose an optimized spiking long short-term memory networks (LSTM) training framework that involves a novel. rev-to-SNN conversion framework, followed by SNN training. We evaluate our framework on sequential learning tasks including temporal M, Google Speech Commands (GSC) datasets, and UCI Smartphone on different LSTM architectures.
arXiv Detail & Related papers (2022-10-23T04:10:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.