Related papers: What Makes Convolutional Models Great on Long Sequence Modeling?

What Makes Convolutional Models Great on Long Sequence Modeling?

URL: http://arxiv.org/abs/2210.09298v1
Date: Mon, 17 Oct 2022 17:53:29 GMT
Title: What Makes Convolutional Models Great on Long Sequence Modeling?
Authors: Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, Debadeepta Dey
Abstract summary: We focus on the structure of the convolution kernel and identify two critical but intuitive principles. We propose a simple yet effective convolutional model called Structured Global Convolution (SGConv)
Score: 30.50800981442449
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional models have been widely used in multiple domains. However, most existing models only use local convolution, making the model unable to handle long-range dependency efficiently. Attention overcomes this problem by aggregating global information but also makes the computational complexity quadratic to the sequence length. Recently, Gu et al. [2021] proposed a model called S4 inspired by the state space model. S4 can be efficiently implemented as a global convolutional model whose kernel size equals the input sequence length. S4 can model much longer sequences than Transformers and achieve significant gains over SoTA on several long-range tasks. Despite its empirical success, S4 is involved. It requires sophisticated parameterization and initialization schemes. As a result, S4 is less intuitive and hard to use. Here we aim to demystify S4 and extract basic principles that contribute to the success of S4 as a global convolutional model. We focus on the structure of the convolution kernel and identify two critical but intuitive principles enjoyed by S4 that are sufficient to make up an effective global convolutional model: 1) The parameterization of the convolutional kernel needs to be efficient in the sense that the number of parameters should scale sub-linearly with sequence length. 2) The kernel needs to satisfy a decaying structure that the weights for convolving with closer neighbors are larger than the more distant ones. Based on the two principles, we propose a simple yet effective convolutional model called Structured Global Convolution (SGConv). SGConv exhibits strong empirical performance over several tasks: 1) With faster speed, SGConv surpasses S4 on Long Range Arena and Speech Command datasets. 2) When plugging SGConv into standard language and vision models, it shows the potential to improve both efficiency and performance.

Related papers

Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale [68.6602625868888]
We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations. Operators in hybrid models can be tailored to token manipulation tasks such as in-context recall, multi-token recall, and compression. We train end-to-end 1.2 to 2.9 times faster than optimized Transformers, and 1.1 to 1.4 times faster than previous generation hybrids.
arXiv Detail & Related papers (2025-02-25T19:47:20Z)
Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling [13.627888191693712]
We present a novel approach to parameterizing global convolutional kernels for long-sequence modelling. Our experiments demonstrate state-of-the-art performance on the Long Range Arena, Sequential CIFAR, and Speech Commands tasks. We also report improved performance on ImageNet classification by replacing 2D convolutions with 1D $textttMRConv$ layers.
arXiv Detail & Related papers (2024-08-18T12:20:03Z)
Robust kernel-free quadratic surface twin support vector machine with capped $L_1$-norm distance metric [0.46040036610482665]
This paper proposes a robust capped L_norm kernel-free surface twin support vector machine (CL_QTSVM) The robustness of our model is further improved by employing the capped L_norm distance metric. An iterative algorithm is developed to efficiently solve the proposed model.
arXiv Detail & Related papers (2024-05-27T09:23:52Z)
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences. We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook. LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z)
Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling. It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z)
Robustifying State-space Models for Long Sequences via Approximate Diagonalization [47.321212977509454]
State-space models (SSMs) have emerged as a framework for learning long-range sequence tasks. diagonalizing the HiPPO framework is itself an ill-posed problem. We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology.
arXiv Detail & Related papers (2023-10-02T23:36:13Z)
Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE. Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4. We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z)
Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark. On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z)
Long Range Language Modeling via Gated State Spaces [67.64091993846269]
We focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. We propose a new layer named Gated State Space (GSS) and show that it trains significantly faster than the diagonal version of S4.
arXiv Detail & Related papers (2022-06-27T01:50:18Z)
On the Parameterization and Initialization of Diagonal State Space Models [35.68370606343843]
We show how to parameterize and initialize diagonal state space models. We show that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension.
arXiv Detail & Related papers (2022-06-23T17:58:39Z)
Diagonal State Spaces are as Effective as Structured State Spaces [3.8276199743296906]
We show that our $textitDiagonal State Space$ (DSS) model matches the performance of S4 on Long Range Arena tasks, speech classification on Speech Commands dataset, while being conceptually simpler and straightforward to implement. In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal.
arXiv Detail & Related papers (2022-03-27T16:30:33Z)
Efficiently Modeling Long Sequences with Structured State Spaces [15.456254157293836]
We propose a new sequence model based on a new parameterization for the fundamental state space model. S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet.
arXiv Detail & Related papers (2021-10-31T03:32:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.