Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
- URL: http://arxiv.org/abs/2404.16112v1
- Date: Wed, 24 Apr 2024 18:10:31 GMT
- Title: Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
- Authors: Badri Narayana Patro, Vijay Srinivas Agneeswaran,
- Abstract summary: State Space Models (SSMs) have emerged as promising alternatives for sequence modeling paradigms.
This survey highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis.
- Score: 1.4408339076385341
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{https://github.com/badripatro/mamba360}.
Related papers
- TIMBA: Time series Imputation with Bi-directional Mamba Blocks and Diffusion models [0.0]
We propose replacing time-oriented Transformers with State-Space Models (SSM)
We develop a model that integrates SSM, Graph Neural Networks, and node-oriented Transformers to achieve enhanced representations.
arXiv Detail & Related papers (2024-10-08T11:10:06Z) - Longhorn: State Space Models are Amortized Online Learners [51.10124201221601]
State-space models (SSMs) offer linear decoding efficiency while maintaining parallelism during training.
In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems.
We introduce a novel deep SSM architecture, Longhorn, whose update resembles the closed-form solution for solving the online associative recall problem.
arXiv Detail & Related papers (2024-07-19T11:12:08Z) - Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis [8.115549269867403]
State Space Models (SSMs) have garnered immense interest lately in sequential modeling and visual representation learning.
Capitalizing on the advances in computer vision, medical imaging has heralded a new epoch with Mamba models.
arXiv Detail & Related papers (2024-06-05T16:29:03Z) - Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - Vision Mamba: A Comprehensive Survey and Taxonomy [11.025533218561284]
State Space Model (SSM) is a mathematical model used to describe and analyze the behavior of dynamic systems.
Based on the latest state-space models, Mamba merges time-varying parameters into SSMs and formulates a hardware-aware algorithm for efficient training and inference.
Mamba is expected to become a new AI architecture that may outperform Transformer.
arXiv Detail & Related papers (2024-05-07T15:30:14Z) - State Space Model for New-Generation Network Alternative to Transformers: A Survey [52.812260379420394]
In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks.
To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods.
Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years.
arXiv Detail & Related papers (2024-04-15T07:24:45Z) - SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding [50.337896542603524]
We introduce SpikeMba: a multi-modal spiking saliency mamba for temporal video grounding.
Our approach integrates Spiking Neural Networks (SNNs) with state space models (SSMs) to leverage their unique advantages.
Our experiments demonstrate the effectiveness of SpikeMba, which consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-04-01T15:26:44Z) - SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series [2.4379295576598436]
We propose SiMBA, a new architecture that introduces Einstein FFT (EinFFT) for channel modeling by specific eigenvalue computations and uses the Mamba block for sequence modeling.
We show that SiMBA outperforms existing SSMs, bridging the performance gap with state-of-the-art transformers.
arXiv Detail & Related papers (2024-03-22T17:22:56Z) - Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data [26.457571615782985]
Mamba, based on state space models, has been shown to achieve comparable performance for modeling text sequences.
We present Mamba-ND, a generalized design extending the Mamba architecture to arbitrary multi-dimensional data.
We show that Mamba-ND demonstrates performance competitive with the state-of-the-art on a variety of multi-dimensional benchmarks.
arXiv Detail & Related papers (2024-02-08T18:30:50Z) - Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling.
It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z) - Long Range Arena: A Benchmark for Efficient Transformers [115.1654897514089]
Long-rangearena benchmark is a suite of tasks consisting of sequences ranging from $1K$ to $16K$ tokens.
We systematically evaluate ten well-established long-range Transformer models on our newly proposed benchmark suite.
arXiv Detail & Related papers (2020-11-08T15:53:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.