Related papers: Convolutional State Space Models for Long-Range Spatiotemporal Modeling

Convolutional State Space Models for Long-Range Spatiotemporal Modeling

URL: http://arxiv.org/abs/2310.19694v1
Date: Mon, 30 Oct 2023 16:11:06 GMT
Title: Convolutional State Space Models for Long-Range Spatiotemporal Modeling
Authors: Jimmy T.H. Smith, Shalini De Mello, Jan Kautz, Scott W. Linderman, Wonmin Byeon
Abstract summary: ConvS5 is an efficient variant for long-rangetemporal modeling. It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
Score: 65.0993000439043
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Effectively modeling long spatiotemporal sequences is challenging due to the need to model complex spatial correlations and long-range temporal dependencies simultaneously. ConvLSTMs attempt to address this by updating tensor-valued states with recurrent neural networks, but their sequential computation makes them slow to train. In contrast, Transformers can process an entire spatiotemporal sequence, compressed into tokens, in parallel. However, the cost of attention scales quadratically in length, limiting their scalability to longer sequences. Here, we address the challenges of prior methods and introduce convolutional state space models (ConvSSM) that combine the tensor modeling ideas of ConvLSTM with the long sequence modeling approaches of state space methods such as S4 and S5. First, we demonstrate how parallel scans can be applied to convolutional recurrences to achieve subquadratic parallelization and fast autoregressive generation. We then establish an equivalence between the dynamics of ConvSSMs and SSMs, which motivates parameterization and initialization strategies for modeling long-range dependencies. The result is ConvS5, an efficient ConvSSM variant for long-range spatiotemporal modeling. ConvS5 significantly outperforms Transformers and ConvLSTM on a long horizon Moving-MNIST experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers. In addition, ConvS5 matches or exceeds the performance of state-of-the-art methods on challenging DMLab, Minecraft and Habitat prediction benchmarks and enables new directions for modeling long spatiotemporal sequences.

Related papers

Minimal Convolutional RNNs Accelerate Spatiotemporal Learning [4.918567856499736]
We introduce MinConvLSTM and MinConvGRU, twotemporal models that combine the spatial biases of convolutional recurrent networks with the training efficiency of minimal, parallel RNNs.<n>Our models are structurally minimal computation and computationally efficient, with reduced parameter count and improved scalability.
arXiv Detail & Related papers (2025-08-05T16:28:43Z)
Scaling Algorithm Distillation for Continuous Control with Mamba [0.0]
Algorithm Distillation (AD) was recently proposed as a new approach to perform In-Context Reinforcement Learning (ICRL)<n>We show that scaling AD to very long contexts can improve ICRL performance and make it competitive even with a SOTA online meta RL baseline.
arXiv Detail & Related papers (2025-06-16T18:15:02Z)
Sequential-Parallel Duality in Prefix Scannable Models [68.39855814099997]
Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba.<n>This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference?
arXiv Detail & Related papers (2025-06-12T17:32:02Z)
Learning to Dissipate Energy in Oscillatory State-Space Models [55.09730499143998]
State-space models (SSMs) are a class of networks for sequence learning.<n>We show that D-LinOSS consistently outperforms previous LinOSS methods on long-range learning tasks.
arXiv Detail & Related papers (2025-05-17T23:15:17Z)
Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data. We train the model using maximum likelihood estimation with Markov chain Monte Carlo. Experiments on oscillating systems, videos and real-world state sequences (MuJoCo) illustrate that ODEs with the learnable energy-based prior outperform existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z)
Machine-Learned Closure of URANS for Stably Stratified Turbulence: Connecting Physical Timescales & Data Hyperparameters of Deep Time-Series Models [0.0]
We develop time-series machine learning (ML) methods for closure modeling of the Unsteady Reynolds Averaged Navier Stokes equations. We consider decaying SST which are homogeneous and stably stratified by a uniform density gradient. We find that the ratio of the timescales of the minimum information required by the ML models to accurately capture the dynamics of the SST corresponds to the Reynolds number of the flow.
arXiv Detail & Related papers (2024-04-24T18:58:00Z)
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences. We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook. LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z)
Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE. Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4. We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z)
Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark. On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z)
Traversing Time with Multi-Resolution Gaussian Process State-Space Models [17.42262122708566]
We propose a novel Gaussian process state-space architecture composed of multiple components, each trained on a different resolution, to model effects on different timescales. We benchmark our novel method on semi-synthetic data and on an engine modeling task. In both experiments, our approach compares favorably against its state-of-the-art alternatives that operate on a single time-scale only.
arXiv Detail & Related papers (2021-12-06T18:39:27Z)
Alternating ConvLSTM: Learning Force Propagation with Alternate State Updates [29.011464047344614]
We introduce the alternating convolutional Long Short-Term Memory (Alt-ConvLSTM) that models the force propagation mechanisms in a deformable object with near-uniform material properties. We demonstrate how this novel scheme imitates the alternate updates of the first and second-order terms in the forward method of numerical PDE solvers. We validate our Alt-ConvLSTM on human soft tissue simulation with thousands of particles and consistent body pose changes.
arXiv Detail & Related papers (2020-06-14T06:43:33Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.