Related papers: Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

URL: http://arxiv.org/abs/2312.17044v5
Date: Sun, 06 Oct 2024 08:37:05 GMT
Title: Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding
Authors: Liang Zhao, Xiachong Feng, Xiaocheng Feng, Weihong Zhong, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin, Ting Liu,
Abstract summary: All Transformer-based models including large language models (LLMs) suffer from a preset length limit. Numerous methods have emerged to enhance the length extrapolation of Transformers. This survey aims to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.
Score: 40.289596031245374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Built upon the Transformer, large language models (LLMs) have captured worldwide attention due to their remarkable abilities. Nevertheless, all Transformer-based models including LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones, namely, they cannot perform length extrapolation to handle long sequences, which severely hinders their application in scenarios demanding long input sequences such as legal or scientific documents. Thus, numerous methods have emerged to enhance the length extrapolation of Transformers. Despite the great research efforts, a systematic survey is still lacking. To fill this gap, we delve into these advances in a unified notation from the perspective of positional encoding (PE), as it has been considered the primary factor on length extrapolation. Specifically, we begin with extrapolatable PEs that have dominated this research field. Then, we dive into extrapolation methods based on them, covering position interpolation and randomized position methods. Finally, several challenges and future directions in this area are highlighted. Through this survey, we aim to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.

Related papers

DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba. We show that DeciMamba can extrapolate context lengths 25x longer than the ones seen during training, and does so without utilizing additional computational resources.
arXiv Detail & Related papers (2024-06-20T17:40:18Z)
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective [35.947737679664016]
This paper offers a straightforward yet in-depth understanding of RoPE extensions from an attention perspective. Using longer continual pretraining lengths for RoPE extensions could reduce attention uncertainty and significantly enhance extrapolation.
arXiv Detail & Related papers (2024-06-19T07:23:33Z)
Length Generalization of Causal Transformers without Position Encoding [59.802708262402824]
Generalizing to longer sentences is important for recent Transformer-based language models. We study the length generalization property of Transformers without position encodings. We find that although NoPE can extend to sequences longer than the commonly used explicit position encodings, it still has a limited context length.
arXiv Detail & Related papers (2024-04-18T14:38:32Z)
Transformers Can Achieve Length Generalization But Not Robustly [76.06308648699357]
We show that the success of length generalization is intricately linked to the data format and the type of position encoding. We show for the first time that standard Transformers can extrapolate to a sequence length that is 2.5x the input length.
arXiv Detail & Related papers (2024-02-14T18:18:29Z)
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era [59.279784235147254]
This survey is aimed at providing an overview of these trends framed under the unifying umbrella of Recurrence. It emphasizes novel research opportunities that become prominent when abandoning the idea of processing long sequences.
arXiv Detail & Related papers (2024-02-12T23:55:55Z)
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models [17.300251335326173]
Large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. This survey provides an inclusive review of the recent techniques and methods devised to extend the sequence length in LLMs.
arXiv Detail & Related papers (2024-02-03T19:20:02Z)
Exploring Transformer Extrapolation [19.729619149887014]
Length extrapolation has attracted considerable attention recently since it allows transformers to be tested on longer sequences than those used in training. Previous research has shown that this property can be attained by using carefully designed Relative Positional corpora. This paper attempts to determine what types of RPEs allow for length extrapolation through a thorough mathematical and empirical analysis.
arXiv Detail & Related papers (2023-07-19T17:37:03Z)
Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis [72.71398034617607]
We dissect a relative positional embedding design, ALiBi, via the lens of receptive field analysis. We modify the vanilla Sinusoidal positional embedding to create bftext, the first parameter-free relative positional embedding design that truly length information uses longer than the training sequence.
arXiv Detail & Related papers (2022-12-20T15:40:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.