Unlimiformer: Long-Range Transformers with Unlimited Length Input
- URL: http://arxiv.org/abs/2305.01625v3
- Date: Mon, 30 Oct 2023 19:44:47 GMT
- Title: Unlimiformer: Long-Range Transformers with Unlimited Length Input
- Authors: Amanda Bertsch, Uri Alon, Graham Neubig, Matthew R. Gormley
- Abstract summary: Unlimiformer is a general approach that wraps any existing pretrained encoder-decoder transformer.
It offloads the cross-attention computation to a single k-nearest-neighbor (kNN) index.
We show that Unlimiformer can process even 500k token-long inputs from the BookSum dataset, without any input truncation at test time.
- Score: 67.04942180004805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since the proposal of transformers, these models have been limited to bounded
input lengths, because of their need to attend to every token in the input. In
this work, we propose Unlimiformer: a general approach that wraps any existing
pretrained encoder-decoder transformer, and offloads the cross-attention
computation to a single k-nearest-neighbor (kNN) index, while the returned kNN
distances are the attention dot-product scores. This kNN index can be kept on
either the GPU or CPU memory and queried in sub-linear time; this way, we can
index practically unlimited input sequences, while every attention head in
every decoder layer retrieves its top-k keys, instead of attending to every
key. We evaluate Unlimiformer on several long-document and book-summarization
benchmarks, showing that it can process even 500k token-long inputs from the
BookSum dataset, without any input truncation at test time. We demonstrate that
Unlimiformer improves pretrained models such as BART and Longformer by
extending them to unlimited inputs without additional learned weights and
without modifying their code. We make our code and models publicly available at
https://github.com/abertsch72/unlimiformer .
Related papers
- Equipping Transformer with Random-Access Reading for Long-Context Understanding [9.433800833564279]
Long-context modeling presents a significant challenge for transformer-based large language models.
We propose a novel reading strategy that enables transformers to efficiently process long documents without examining every token.
arXiv Detail & Related papers (2024-05-21T21:41:07Z) - Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition [7.963605445905696]
Conformer-based attention models have become the de facto backbone model for Automatic Speech Recognition tasks.
We propose a "Skip-and-Recover" Conformer architecture, named Skipformer, to squeeze sequence input length dynamically and inhomogeneously.
Our model reduces the input sequence length by 31 times on Aishell-1 and 22 times on Librispeech corpus.
arXiv Detail & Related papers (2024-03-13T05:20:45Z) - Continuous-time Autoencoders for Regular and Irregular Time Series Imputation [21.25279298572273]
Time series imputation is one of the most fundamental tasks for time series.
Recent self-attention-based methods show the state-of-the-art imputation performance.
It has been overlooked for a long time to design an imputation method based on continuous-time recurrent neural networks.
arXiv Detail & Related papers (2023-12-27T14:13:42Z) - Memory-efficient Transformers via Top-$k$ Attention [23.672065688109395]
In this work, we propose a simple yet highly accurate approximation for vanilla attention.
We process the queries in chunks, and for each query, compute the top-$k$ scores with respect to the keys.
We show our approach leads to accuracy that is nearly-identical to vanilla attention in multiple setups including training from scratch, fine-tuning, and zero-shot inference.
arXiv Detail & Related papers (2021-06-13T02:30:23Z) - Adaptive Nearest Neighbor Machine Translation [60.97183408140499]
kNN-MT combines pre-trained neural machine translation with token-level k-nearest-neighbor retrieval.
Traditional kNN algorithm simply retrieves a same number of nearest neighbors for each target token.
We propose Adaptive kNN-MT to dynamically determine the number of k for each target token.
arXiv Detail & Related papers (2021-05-27T09:27:42Z) - FSR: Accelerating the Inference Process of Transducer-Based Models by
Applying Fast-Skip Regularization [72.9385528828306]
A typical transducer model decodes the output sequence conditioned on the current acoustic state.
The number of blank tokens in the prediction results accounts for nearly 90% of all tokens.
We propose a method named fast-skip regularization, which tries to align the blank position predicted by a transducer with that predicted by a CTC model.
arXiv Detail & Related papers (2021-04-07T03:15:10Z) - Nystr\"omformer: A Nystr\"om-Based Algorithm for Approximating
Self-Attention [60.043273122786005]
We propose Nystr"omformer -- a model that exhibits favorable scalability as a function of sequence length.
The scalability of Nystr"omformer enables application to longer sequences with thousands of tokens.
We perform evaluations on multiple downstream tasks on the GLUE benchmark and reviews with standard sequence length, and find that our Nystr"omformer performs comparably, or in a few cases, even slightly better, than standard Transformer.
arXiv Detail & Related papers (2021-02-07T20:06:59Z) - Learning to Encode Position for Transformer with Continuous Dynamical
Model [88.69870971415591]
We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models.
We model the evolution of encoded results along position index by such a dynamical system.
arXiv Detail & Related papers (2020-03-13T00:41:41Z) - Pruning Neural Belief Propagation Decoders [77.237958592189]
We introduce a method to tailor an overcomplete parity-check matrix to (neural) BP decoding using machine learning.
We achieve performance within 0.27 dB and 1.5 dB of the ML performance while reducing the complexity of the decoder.
arXiv Detail & Related papers (2020-01-21T12:05:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.