Efficiently Modeling Long Sequences with Structured State Spaces
- URL: http://arxiv.org/abs/2111.00396v1
- Date: Sun, 31 Oct 2021 03:32:18 GMT
- Title: Efficiently Modeling Long Sequences with Structured State Spaces
- Authors: Albert Gu, Karan Goel, Christopher R\'e
- Abstract summary: We propose a new sequence model based on a new parameterization for the fundamental state space model.
S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet.
- Score: 15.456254157293836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central goal of sequence modeling is designing a single principled model
that can address sequence data across a range of modalities and tasks,
particularly on long-range dependencies. Although conventional models including
RNNs, CNNs, and Transformers have specialized variants for capturing long
dependencies, they still struggle to scale to very long sequences of $10000$ or
more steps. A promising recent approach proposed modeling sequences by
simulating the fundamental state space model (SSM) \( x'(t) = Ax(t) + Bu(t),
y(t) = Cx(t) + Du(t) \), and showed that for appropriate choices of the state
matrix \( A \), this system could handle long-range dependencies mathematically
and empirically. However, this method has prohibitive computation and memory
requirements, rendering it infeasible as a general sequence modeling solution.
We propose the Structured State Space (S4) sequence model based on a new
parameterization for the SSM, and show that it can be computed much more
efficiently than prior approaches while preserving their theoretical strengths.
Our technique involves conditioning \( A \) with a low-rank correction,
allowing it to be diagonalized stably and reducing the SSM to the well-studied
computation of a Cauchy kernel. S4 achieves strong empirical results across a
diverse range of established benchmarks, including (i) 91\% accuracy on
sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with
a larger 2-D ResNet, (ii) substantially closing the gap to Transformers on
image and language modeling tasks, while performing generation $60\times$
faster (iii) SoTA on every task from the Long Range Arena benchmark, including
solving the challenging Path-X task of length 16k that all prior work fails on,
while being as efficient as all competitors.
Related papers
- COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient [9.519619751861333]
We propose a state space model (SSM) based world model based on Mamba.
It achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies.
This model is accessible and can be trained on an off-the-shelf laptop.
arXiv Detail & Related papers (2024-10-11T15:10:40Z) - Oscillatory State-Space Models [61.923849241099184]
We propose Lineary State-Space models (LinOSS) for efficiently learning on long sequences.
A stable discretization, integrated over time using fast associative parallel scans, yields the proposed state-space model.
We show that LinOSS is universal, i.e., it can approximate any continuous and causal operator mapping between time-varying functions.
arXiv Detail & Related papers (2024-10-04T22:00:13Z) - LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences.
We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook.
LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling.
It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z) - Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors [44.5740422079]
We show that pretraining with standard denoising objectives leads to dramatic gains across multiple architectures.
In stark contrast to prior works, we find vanilla Transformers to match the performance of S4 on Long Range Arena when properly pretrained.
arXiv Detail & Related papers (2023-10-04T17:17:06Z) - Continuous-time convolutions model of event sequences [46.3471121117337]
Event sequences are non-uniform and sparse, making traditional models unsuitable.
We propose COTIC, a method based on an efficient convolution neural network designed to handle the non-uniform occurrence of events over time.
COTIC outperforms existing models in predicting the next event time and type, achieving an average rank of 1.5 compared to 3.714 for the nearest competitor.
arXiv Detail & Related papers (2023-02-13T10:34:51Z) - A Unified View of Long-Sequence Models towards Modeling Million-Scale
Dependencies [0.0]
We compare existing solutions to long-sequence modeling in terms of their pure mathematical formulation.
We then demonstrate that long context length does yield better performance, albeit application-dependent.
Inspired by emerging sparse models of huge capacity, we propose a machine learning system for handling million-scale dependencies.
arXiv Detail & Related papers (2023-02-13T09:47:31Z) - Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models.
We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.