Related papers: On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages

On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages

URL: http://arxiv.org/abs/2412.19350v1
Date: Thu, 26 Dec 2024 20:53:04 GMT
Title: On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages
Authors: Aleksandar Terzić, Michael Hersche, Giacomo Camposampiero, Thomas Hofmann, Abu Sebastian, Abbas Rahimi,
Abstract summary: Selective state-space models (SSMs) are an emerging alternative to the Transformer.<n>We analyze their expressiveness and length generalization performance on regular language tasks.<n>We introduce the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization.
Score: 56.22289522687125
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Selective state-space models (SSMs) are an emerging alternative to the Transformer, offering the unique advantage of parallel training and sequential inference. Although these models have shown promising performance on a variety of tasks, their formal expressiveness and length generalization properties remain underexplored. In this work, we provide insight into the workings of selective SSMs by analyzing their expressiveness and length generalization performance on regular language tasks, i.e., finite-state automaton (FSA) emulation. We address certain limitations of modern SSM-based architectures by introducing the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization on a set of various regular language tasks using a single layer. It utilizes a dictionary of dense transition matrices, a softmax selection mechanism that creates a convex combination of dictionary matrices at each time step, and a readout consisting of layer normalization followed by a linear map. We then proceed to evaluate variants of diagonal selective SSMs by considering their empirical performance on commutative and non-commutative automata. We explain the experimental results with theoretical considerations. Our code is available at https://github.com/IBM/selective-dense-state-space-model.

Related papers

Implicit Language Models are RNNs: Balancing Parallelization and Expressivity [4.332158627306896]
State-space models (SSMs) and transformers dominate the language modeling landscape. We propose implicit SSMs, which iterate a transformation until convergence to a fixed point. Our approach demonstrates superior state-tracking capabilities on regular languages, surpassing transformers and SSMs.
arXiv Detail & Related papers (2025-02-10T19:59:31Z)
SeRpEnt: Selective Resampling for Expressive State Space Models [5.7918134313332414]
State Space Models (SSMs) have recently enjoyed a rise to prominence in the field of deep learning for sequence modeling. We show how selective time intervals in Mamba act as linear approximators of information. We propose our SeRpEnt architecture, a SSM that further exploits selectivity to compress sequences in an information-aware fashion.
arXiv Detail & Related papers (2025-01-20T20:27:50Z)
Recursive Learning of Asymptotic Variational Objectives [49.69399307452126]
General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data. Online sequential IWAE (OSIWAE) allows for online learning of both model parameters and a Markovian recognition model for inferring latent states. This approach is more theoretically well-founded than recently proposed online variational SMC methods.
arXiv Detail & Related papers (2024-11-04T16:12:37Z)
Longhorn: State Space Models are Amortized Online Learners [51.10124201221601]
State-space models (SSMs) offer linear decoding efficiency while maintaining parallelism during training. In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems. We introduce a novel deep SSM architecture, Longhorn, whose update resembles the closed-form solution for solving the online associative recall problem.
arXiv Detail & Related papers (2024-07-19T11:12:08Z)
From Generalization Analysis to Optimization Designs for State Space Models [14.932318540666547]
A State Space Model (SSM) is a foundation model in time series analysis. We propose improvements to training algorithms based on the generalization results.
arXiv Detail & Related papers (2024-05-04T13:58:03Z)
Theoretical Foundations of Deep Selective State-Space Models [13.971499161967083]
Deep SSMs demonstrate outstanding performance across a diverse set of domains. Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states. We show that when random linear recurrences are equipped with simple input-controlled transitions, then the hidden state is provably a low-dimensional projection of a powerful mathematical object.
arXiv Detail & Related papers (2024-02-29T11:20:16Z)
RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence Learning [75.61681328968714]
We propose recurrent independent Grid LSTM (RigLSTM) to exploit the underlying modular structure of the target task. Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability.
arXiv Detail & Related papers (2023-11-03T07:40:06Z)
Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z)
Structured Reordering for Modeling Latent Alignments in Sequence Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.