Related papers: Efficiently Modeling Long Sequences with Structured State Spaces

Efficiently Modeling Long Sequences with Structured State Spaces

URL: http://arxiv.org/abs/2111.00396v1
Date: Sun, 31 Oct 2021 03:32:18 GMT
Title: Efficiently Modeling Long Sequences with Structured State Spaces
Authors: Albert Gu, Karan Goel, Christopher R\'e
Abstract summary: We propose a new sequence model based on a new parameterization for the fundamental state space model. S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet.
Score: 15.456254157293836
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of $10000$ or more steps. A promising recent approach proposed modeling sequences by simulating the fundamental state space model (SSM) $ x'(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t) $, and showed that for appropriate choices of the state matrix $ A $, this system could handle long-range dependencies mathematically and empirically. However, this method has prohibitive computation and memory requirements, rendering it infeasible as a general sequence modeling solution. We propose the Structured State Space (S4) sequence model based on a new parameterization for the SSM, and show that it can be computed much more efficiently than prior approaches while preserving their theoretical strengths. Our technique involves conditioning $ A $ with a low-rank correction, allowing it to be diagonalized stably and reducing the SSM to the well-studied computation of a Cauchy kernel. S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91\% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet, (ii) substantially closing the gap to Transformers on image and language modeling tasks, while performing generation $60\times$ faster (iii) SoTA on every task from the Long Range Arena benchmark, including solving the challenging Path-X task of length 16k that all prior work fails on, while being as efficient as all competitors.

Related papers

Leveraging State Space Models in Long Range Genomics [1.130790932059036]
Long-range dependencies are critical for understanding genomic structure and function, yet most conventional methods struggle with them. We explore State Space Models (SSMs) as a promising alternative by benchmarking two SSM-inspired architectures on long-range genomics modeling tasks. SSMs match transformer performance and exhibit impressive zero-shot extrapolation across multiple tasks, handling contexts 10 to 100 times longer than those seen during training.
arXiv Detail & Related papers (2025-04-07T18:34:06Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches. We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient [9.519619751861333]
We propose a state space model (SSM) based world model based on Mamba. It achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies. This model is accessible and can be trained on an off-the-shelf laptop.
arXiv Detail & Related papers (2024-10-11T15:10:40Z)
Oscillatory State-Space Models [61.923849241099184]
We propose Lineary State-Space models (LinOSS) for efficiently learning on long sequences. A stable discretization, integrated over time using fast associative parallel scans, yields the proposed state-space model. We show that LinOSS is universal, i.e., it can approximate any continuous and causal operator mapping between time-varying functions.
arXiv Detail & Related papers (2024-10-04T22:00:13Z)
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences. We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook. LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z)
Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC) LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses. LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z)
Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling. It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z)
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors [44.5740422079]
We show that pretraining with standard denoising objectives leads to dramatic gains across multiple architectures. In stark contrast to prior works, we find vanilla Transformers to match the performance of S4 on Long Range Arena when properly pretrained.
arXiv Detail & Related papers (2023-10-04T17:17:06Z)
Continuous-time convolutions model of event sequences [46.3471121117337]
Event sequences are non-uniform and sparse, making traditional models unsuitable. We propose COTIC, a method based on an efficient convolution neural network designed to handle the non-uniform occurrence of events over time. COTIC outperforms existing models in predicting the next event time and type, achieving an average rank of 1.5 compared to 3.714 for the nearest competitor.
arXiv Detail & Related papers (2023-02-13T10:34:51Z)
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies [0.0]
We compare existing solutions to long-sequence modeling in terms of their pure mathematical formulation. We then demonstrate that long context length does yield better performance, albeit application-dependent. Inspired by emerging sparse models of huge capacity, we propose a machine learning system for handling million-scale dependencies.
arXiv Detail & Related papers (2023-02-13T09:47:31Z)
Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models. We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.