LMUFormer: Low Complexity Yet Powerful Spiking Model With Legendre
Memory Units
- URL: http://arxiv.org/abs/2402.04882v1
- Date: Sat, 20 Jan 2024 01:10:18 GMT
- Title: LMUFormer: Low Complexity Yet Powerful Spiking Model With Legendre
Memory Units
- Authors: Zeyu Liu, Gourav Datta, Anni Li, Peter Anthony Beerel
- Abstract summary: Transformer models have demonstrated high accuracy in numerous applications but have high complexity and lack sequential processing capability.
We show how architectural modifications to a recurrent model can help push its performance toward Transformer models.
We present a spiking version of this architecture, which introduces the benefit of states within the patch embedding and channel mixer modules.
- Score: 5.830814457423021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer models have demonstrated high accuracy in numerous applications
but have high complexity and lack sequential processing capability making them
ill-suited for many streaming applications at the edge where devices are
heavily resource-constrained. Thus motivated, many researchers have proposed
reformulating the transformer models as RNN modules which modify the
self-attention computation with explicit states. However, these approaches
often incur significant performance degradation. The ultimate goal is to
develop a model that has the following properties: parallel training, streaming
and low-cost inference, and SOTA performance. In this paper, we propose a new
direction to achieve this goal. We show how architectural modifications to a
recurrent model can help push its performance toward Transformer models while
retaining its sequential processing capability. Specifically, inspired by the
recent success of Legendre Memory Units (LMU) in sequence learning tasks, we
propose LMUFormer, which augments the LMU with convolutional patch embedding
and convolutional channel mixer. Moreover, we present a spiking version of this
architecture, which introduces the benefit of states within the patch embedding
and channel mixer modules while simultaneously reducing the computing
complexity. We evaluated our architectures on multiple sequence datasets. In
comparison to SOTA transformer-based models within the ANN domain on the SCv2
dataset, our LMUFormer demonstrates comparable performance while necessitating
a remarkable 53 times reduction in parameters and a substantial 65 times
decrement in FLOPs. Additionally, owing to our model's proficiency in real-time
data processing, we can achieve a 32.03% reduction in sequence length, all
while incurring an inconsequential decline in performance. Our code is publicly
available at https://github.com/zeyuliu1037/LMUFormer.git.
Related papers
- Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines [17.539008562641303]
Large Language Models (LLMs) are currently pre-trained and fine-tuned on large cloud servers.
Next frontier is LLM personalization, where a foundation model can be fine-tuned with user/task-specific data.
Fine-tuning on resource-constrained edge devices presents significant challenges due to substantial memory and computational demands.
arXiv Detail & Related papers (2024-09-23T20:14:09Z) - LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones [10.435069781620957]
Research in efficient vision backbones is evolving into models that are a mixture of convolutions and transformer blocks.
We analyze common modules and architectural design choices for backbones not in terms of MACs, but rather in actual throughput and latency.
We combine both macro and micro design to create a new family of hardware-efficient backbone networks called LowFormer.
arXiv Detail & Related papers (2024-09-05T12:18:32Z) - Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment [56.44025052765861]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks.
We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs.
We show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x.
arXiv Detail & Related papers (2024-05-06T16:03:32Z) - LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences.
We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook.
LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z) - Mamba: Linear-Time Sequence Modeling with Selective State Spaces [31.985243136674146]
Foundation models are almost universally based on the Transformer architecture and its core attention module.
We identify that a key weakness of such models is their inability to perform content-based reasoning.
We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even blocks (Mamba)
As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics.
arXiv Detail & Related papers (2023-12-01T18:01:34Z) - Harnessing Manycore Processors with Distributed Memory for Accelerated
Training of Sparse and Recurrent Models [43.1773057439246]
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures.
We explore sparse and recurrent model training on a massively parallel multiple instruction multiple data architecture with distributed local memory.
arXiv Detail & Related papers (2023-11-07T23:18:35Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - Parameter-efficient Tuning of Large-scale Multimodal Foundation Model [68.24510810095802]
We propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges.
Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning.
A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach.
arXiv Detail & Related papers (2023-05-15T06:40:56Z) - Parallelizing Legendre Memory Unit Training [5.076419064097734]
A new recurrent neural network (RNN) named the Legendre Memory Unit (LMU) was proposed and shown to achieve state-of-the-art performance on several benchmark datasets.
Here we leverage the linear time-invariant (LTI) memory component of the LMU to construct a simplified variant that can be parallelized during training.
We show that this reformulation that aids parallelizing, which can be applied generally to any deep network whose recurrent components are linear, makes training up to 200 times faster.
arXiv Detail & Related papers (2021-02-22T23:43:47Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.