Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding
- URL: http://arxiv.org/abs/2312.17044v4
- Date: Tue, 2 Apr 2024 04:56:52 GMT
- Title: Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding
- Authors: Liang Zhao, Xiaocheng Feng, Xiachong Feng, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin, Ting Liu,
- Abstract summary: Transformer has taken the field of natural language processing (NLP) by storm since its birth.
Large language models (LLMs) built upon it have captured worldwide attention due to its superior abilities.
All Transformer-based models including these powerful LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones.
- Score: 40.98734594005952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer has taken the field of natural language processing (NLP) by storm since its birth. Further, Large language models (LLMs) built upon it have captured worldwide attention due to its superior abilities. Nevertheless, all Transformer-based models including these powerful LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones, namely, they can not perform length extrapolation. Hence, a plethora of methods have been proposed to enhance length extrapolation of Transformer, in which the positional encoding (PE) is recognized as the major factor. In this survey, we present these advances towards length extrapolation in a unified notation from the perspective of PE. Specifically, we first introduce extrapolatable PEs, including absolute and relative PEs. Then, we dive into extrapolation methods based on them, covering position interpolation and randomized position methods. Finally, several challenges and future directions in this area are highlighted. Through this survey, We aim to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.
Related papers
- Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective [35.947737679664016]
This paper offers a straightforward yet in-depth understanding of RoPE extensions from an attention perspective.
Using longer continual pretraining lengths for RoPE extensions could reduce attention uncertainty and significantly enhance extrapolation.
arXiv Detail & Related papers (2024-06-19T07:23:33Z) - Length Generalization of Causal Transformers without Position Encoding [59.802708262402824]
Generalizing to longer sentences is important for recent Transformer-based language models.
We study the length generalization property of Transformers without position encodings.
We find that although NoPE can extend to sequences longer than the commonly used explicit position encodings, it still has a limited context length.
arXiv Detail & Related papers (2024-04-18T14:38:32Z) - Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models [17.300251335326173]
Large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses.
This survey provides an inclusive review of the recent techniques and methods devised to extend the sequence length in LLMs.
arXiv Detail & Related papers (2024-02-03T19:20:02Z) - Exploring Transformer Extrapolation [19.729619149887014]
Length extrapolation has attracted considerable attention recently since it allows transformers to be tested on longer sequences than those used in training.
Previous research has shown that this property can be attained by using carefully designed Relative Positional corpora.
This paper attempts to determine what types of RPEs allow for length extrapolation through a thorough mathematical and empirical analysis.
arXiv Detail & Related papers (2023-07-19T17:37:03Z) - The Impact of Positional Encoding on Length Generalization in
Transformers [50.48278691801413]
We compare the length generalization performance of decoder-only Transformers with five different position encoding approaches.
Our findings reveal that the most commonly used positional encoding methods, such as ALiBi, Rotary, and APE, are not well suited for length generalization in downstream tasks.
arXiv Detail & Related papers (2023-05-31T00:29:55Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - Dissecting Transformer Length Extrapolation via the Lens of Receptive
Field Analysis [72.71398034617607]
We dissect a relative positional embedding design, ALiBi, via the lens of receptive field analysis.
We modify the vanilla Sinusoidal positional embedding to create bftext, the first parameter-free relative positional embedding design that truly length information uses longer than the training sequence.
arXiv Detail & Related papers (2022-12-20T15:40:17Z) - Your Transformer May Not be as Powerful as You Expect [88.11364619182773]
We mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximating any continuous sequence-to-sequence functions.
We present a negative result by showing there exist continuous sequence-to-sequence functions that RPE-based Transformers cannot approximate no matter how deep and wide the neural network is.
We develop a novel attention module, called Universal RPE-based (URPE) Attention, which satisfies the conditions.
arXiv Detail & Related papers (2022-05-26T14:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.