Flash STU: Fast Spectral Transform Units
- URL: http://arxiv.org/abs/2409.10489v3
- Date: Mon, 23 Sep 2024 00:26:07 GMT
- Title: Flash STU: Fast Spectral Transform Units
- Authors: Y. Isabel Liu, Windsor Nguyen, Yagiz Devre, Evan Dogariu, Anirudha Majumdar, Elad Hazan,
- Abstract summary: This paper describes an efficient, open source PyTorch implementation of the Spectral Transform Unit.
We investigate sequence prediction tasks over several modalities including language, robotics, and simulated dynamical systems.
- Score: 19.889367504937177
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes an efficient, open source PyTorch implementation of the Spectral Transform Unit. We investigate sequence prediction tasks over several modalities including language, robotics, and simulated dynamical systems. We find that for the same parameter count, the STU and its variants outperform the Transformer as well as other leading state space models across various modalities.
Related papers
- Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing [53.295515505026096]
Janus-Pro-driven Prompt Parsing is a prompt- parsing module that bridges text understanding and layout generation.
MIGLoRA is a parameter-efficient plug-in integrating Low-Rank Adaptation into UNet (SD1.5) and DiT (SD3) backbones.
The proposed method achieves state-of-the-art performance on COCO and LVIS benchmarks while maintaining parameter efficiency.
arXiv Detail & Related papers (2025-03-27T00:59:14Z) - Automatically Learning Hybrid Digital Twins of Dynamical Systems [56.69628749813084]
Digital Twins (DTs) simulate the states and temporal dynamics of real-world systems.
DTs often struggle to generalize to unseen conditions in data-scarce settings.
In this paper, we propose an evolutionary algorithm ($textbfHDTwinGen$) to autonomously propose, evaluate, and optimize HDTwins.
arXiv Detail & Related papers (2024-10-31T07:28:22Z) - FACTS: A Factored State-Space Framework For World Modelling [24.08175276756845]
We propose a novel recurrent framework, the textbfFACTored textbfState-space (textbfFACTS) model, for spatial-temporal world modelling.
The FACTS framework constructs a graph-memory with a routing mechanism that learns permutable memory representations.
It consistently outperforms or matches specialised state-of-the-art models, despite its general-purpose world modelling design.
arXiv Detail & Related papers (2024-10-28T11:04:42Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Enhanced Structured State Space Models via Grouped FIR Filtering and Attention Sink Mechanisms [0.6718184400443239]
We propose an advanced architecture that mitigates challenges by decomposing A-multiplications into multiple groups.
Inspired by the "attention sink" phenomenon identified in streaming language models, we incorporate a similar mechanism to enhance the stability and performance of our model.
arXiv Detail & Related papers (2024-08-01T02:49:58Z) - SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering [5.016335384639901]
Multi-modal input of Audio-Visual Question Answering (AVQA) makes feature extraction and fusion processes more challenging.
We propose SHMamba: Structured Hyperbolic State Space Model to integrate the advantages of hyperbolic geometry and state space models.
Our method demonstrates superiority among all current major methods and is more suitable for practical application scenarios.
arXiv Detail & Related papers (2024-06-14T08:43:31Z) - A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies [51.7643024367548]
Stable Diffusion Model is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation.
This study focuses on reducing redundant computation in SDM and optimizing the model through both tuning and tuning-free methods.
arXiv Detail & Related papers (2024-05-31T21:47:05Z) - Investigating Recurrent Transformers with Dynamic Halt [64.862738244735]
We study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism.
We propose and investigate novel ways to extend and combine the methods.
arXiv Detail & Related papers (2024-02-01T19:47:31Z) - Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling [10.246977481606427]
We study the mechanisms through which different components of Transformer, such as the dot-product self-attention, affect its expressive power.
Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads.
arXiv Detail & Related papers (2024-02-01T11:43:13Z) - Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling.
It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - Hungry Hungry Hippos: Towards Language Modeling with State Space Models [17.412372994222114]
State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling.
We make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention.
arXiv Detail & Related papers (2022-12-28T17:56:03Z) - Foundation Transformers [105.06915886136524]
We call for the development of Foundation Transformer for true general-purpose modeling.
In this work, we introduce a Transformer variant, named Magneto, to fulfill the goal.
arXiv Detail & Related papers (2022-10-12T17:16:27Z) - Hidden Parameter Recurrent State Space Models For Changing Dynamics
Scenarios [18.08665164701404]
Recurrent State-space models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios.
We introduce the Hidden Recurrent State Space Models (HiP- RSSMs), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors.
We show that HiP- RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real-world systems and simulations.
arXiv Detail & Related papers (2022-06-29T14:54:49Z) - Redesigning the Transformer Architecture with Insights from
Multi-particle Dynamical Systems [32.86421107987556]
We build upon recent developments in analyzing deep neural networks as numerical solvers of ordinary differential equations.
We formulate a temporal evolution scheme, TransEvolve, to bypass costly dot-product attention over multiple stacked layers.
We perform exhaustive experiments with TransEvolve on well-known encoder-decoder as well as encoder-only tasks.
arXiv Detail & Related papers (2021-09-30T14:01:06Z) - IOT: Instance-wise Layer Reordering for Transformer Structures [173.39918590438245]
We break the assumption of the fixed layer order in the Transformer and introduce instance-wise layer reordering into the model structure.
Our method can also be applied to other architectures beyond Transformer.
arXiv Detail & Related papers (2021-03-05T03:44:42Z) - Learning Discrete Energy-based Models via Auxiliary-variable Local
Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data.
We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration.
We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z) - Multi-Unit Transformers for Neural Machine Translation [51.418245676894465]
We propose the Multi-Unit Transformers (MUTE) to promote the expressiveness of the Transformer.
Specifically, we use several parallel units and show that modeling with multiple units improves model performance and introduces diversity.
arXiv Detail & Related papers (2020-10-21T03:41:49Z) - Variational Transformers for Diverse Response Generation [71.53159402053392]
Variational Transformer (VT) is a variational self-attentive feed-forward sequence model.
VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE.
We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
arXiv Detail & Related papers (2020-03-28T07:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.