Linearizing Transformer with Key-Value Memory Bank
- URL: http://arxiv.org/abs/2203.12644v1
- Date: Wed, 23 Mar 2022 18:10:18 GMT
- Title: Linearizing Transformer with Key-Value Memory Bank
- Authors: Yizhe Zhang, Deng Cai
- Abstract summary: We propose MemSizer, an approach to project the source sequence into lower dimension representation.
MemSizer not only achieves the same linear time complexity but also enjoys efficient recurrent-style autoregressive generation.
We demonstrate that MemSizer provides an improved tradeoff between efficiency and accuracy over the vanilla transformer.
- Score: 54.83663647680612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer has brought great success to a wide range of natural language
processing tasks. Nevertheless, the computational overhead of the vanilla
transformer scales quadratically with sequence length. Many efforts have been
made to develop more efficient transformer variants. A line of work (e.g.,
Linformer) projects the input sequence into a low-rank space, achieving linear
time complexity. However, Linformer does not suit well for text generation
tasks as the sequence length must be pre-specified. We propose MemSizer, an
approach also projects the source sequence into lower dimension representation
but can take input with dynamic length, with a different perspective of the
attention mechanism. MemSizer not only achieves the same linear time complexity
but also enjoys efficient recurrent-style autoregressive generation, which
yields constant memory complexity and reduced computation at inference. We
demonstrate that MemSizer provides an improved tradeoff between efficiency and
accuracy over the vanilla transformer and other linear variants in language
modeling and machine translation tasks, revealing a viable direction towards
further inference efficiency improvement.
Related papers
- Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers [24.109312575970456]
We propose a simple framework to enable the offthe-shelf pre-trained transformers to process much longer sequences.
Our method divides each long-sequence input into a batch of chunks, then aligns the interchunk information during the encoding steps.
We learn an effective hidden selection policy, which regards the decoders of transformers as environments.
arXiv Detail & Related papers (2023-08-25T05:52:05Z) - Blockwise Parallel Transformer for Large Context Models [70.97386897478238]
Blockwise Parallel Transformer (BPT) is a blockwise computation of self-attention and feedforward network fusion to minimize memory costs.
By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences 32 times longer than vanilla Transformers and up to 4 times longer than previous memory-efficient methods.
arXiv Detail & Related papers (2023-05-30T19:25:51Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - Scaling Transformer to 1M tokens and beyond with RMT [5.60052250541419]
A major limitation for the broader scope of problems solvable by transformers is the quadratic scaling of computational complexity with input size.
In this study, we investigate the recurrent memory augmentation of pre-trained transformer models to extend input context length while linearly scaling compute.
Our approach demonstrates the capability to store information in memory for sequences of up to an unprecedented two million tokens while maintaining high retrieval accuracy.
arXiv Detail & Related papers (2023-04-19T16:18:54Z) - Recurrent Memory Transformer [0.3529736140137003]
We study a memory-augmented segment-level recurrent Transformer (Recurrent Memory Transformer)
We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence.
Our model performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing.
arXiv Detail & Related papers (2022-07-14T13:00:22Z) - Block-Recurrent Transformers [49.07682696216708]
We introduce the Block-Recurrent Transformer, which applies a transformer layer in a recurrent fashion along a sequence.
Our recurrent cell operates on blocks of tokens rather than single tokens, and leverages parallel computation within a block in order to make efficient use of accelerator hardware.
arXiv Detail & Related papers (2022-03-11T23:44:33Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z) - TurboTransformers: An Efficient GPU Serving System For Transformer
Models [17.4637724940437]
The TurboTransformers system consists of a computing runtime and a serving framework.
An efficient parallel algorithm is proposed for GPU-based batch reduction operations.
A memory allocation algorithm is designed for variable-length input situations.
A serving framework equipped with a new batch scheduler achieves the optimal throughput on variable-length requests.
arXiv Detail & Related papers (2020-10-09T07:28:38Z) - Funnel-Transformer: Filtering out Sequential Redundancy for Efficient
Language Processing [112.2208052057002]
We propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one.
With comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks.
arXiv Detail & Related papers (2020-06-05T05:16:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.