Related papers: Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

URL: http://arxiv.org/abs/2312.17044v4
Date: Tue, 2 Apr 2024 04:56:52 GMT
Title: Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding
Authors: Liang Zhao, Xiaocheng Feng, Xiachong Feng, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin, Ting Liu,
Abstract summary: Transformer has taken the field of natural language processing (NLP) by storm since its birth. Large language models (LLMs) built upon it have captured worldwide attention due to its superior abilities. All Transformer-based models including these powerful LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones.
Score: 40.98734594005952
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer has taken the field of natural language processing (NLP) by storm since its birth. Further, Large language models (LLMs) built upon it have captured worldwide attention due to its superior abilities. Nevertheless, all Transformer-based models including these powerful LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones, namely, they can not perform length extrapolation. Hence, a plethora of methods have been proposed to enhance length extrapolation of Transformer, in which the positional encoding (PE) is recognized as the major factor. In this survey, we present these advances towards length extrapolation in a unified notation from the perspective of PE. Specifically, we first introduce extrapolatable PEs, including absolute and relative PEs. Then, we dive into extrapolation methods based on them, covering position interpolation and randomized position methods. Finally, several challenges and future directions in this area are highlighted. Through this survey, We aim to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.

Related papers

Context-aware Biases for Length Extrapolation [0.0]
We propose an additive RPE, Context-Aware Biases for Length Extrapolation (CABLE)<n>By dynamically adjusting positional biases based on the input sequence, CABLE overcomes the rigidity of fixed RPEs.<n>Our method significantly enhances the performance of existing RPE methods tested on the FineWeb-Edu10B and WikiText-103 datasets.
arXiv Detail & Related papers (2025-03-11T05:54:58Z)
DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba. We show that DeciMamba can extrapolate context lengths 25x longer than the ones seen during training, and does so without utilizing additional computational resources.
arXiv Detail & Related papers (2024-06-20T17:40:18Z)
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective [35.947737679664016]
This paper offers a straightforward yet in-depth understanding of RoPE extensions from an attention perspective. Using longer continual pretraining lengths for RoPE extensions could reduce attention uncertainty and significantly enhance extrapolation.
arXiv Detail & Related papers (2024-06-19T07:23:33Z)
Length Generalization of Causal Transformers without Position Encoding [59.802708262402824]
Generalizing to longer sentences is important for recent Transformer-based language models. We study the length generalization property of Transformers without position encodings. We find that although NoPE can extend to sequences longer than the commonly used explicit position encodings, it still has a limited context length.
arXiv Detail & Related papers (2024-04-18T14:38:32Z)
Transformers Can Achieve Length Generalization But Not Robustly [76.06308648699357]
We show that the success of length generalization is intricately linked to the data format and the type of position encoding. We show for the first time that standard Transformers can extrapolate to a sequence length that is 2.5x the input length.
arXiv Detail & Related papers (2024-02-14T18:18:29Z)
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era [59.279784235147254]
This survey is aimed at providing an overview of these trends framed under the unifying umbrella of Recurrence. It emphasizes novel research opportunities that become prominent when abandoning the idea of processing long sequences.
arXiv Detail & Related papers (2024-02-12T23:55:55Z)
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models [17.300251335326173]
Large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. This survey provides an inclusive review of the recent techniques and methods devised to extend the sequence length in LLMs.
arXiv Detail & Related papers (2024-02-03T19:20:02Z)
Exploring Transformer Extrapolation [19.729619149887014]
Length extrapolation has attracted considerable attention recently since it allows transformers to be tested on longer sequences than those used in training. Previous research has shown that this property can be attained by using carefully designed Relative Positional corpora. This paper attempts to determine what types of RPEs allow for length extrapolation through a thorough mathematical and empirical analysis.
arXiv Detail & Related papers (2023-07-19T17:37:03Z)
Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis [72.71398034617607]
We dissect a relative positional embedding design, ALiBi, via the lens of receptive field analysis. We modify the vanilla Sinusoidal positional embedding to create bftext, the first parameter-free relative positional embedding design that truly length information uses longer than the training sequence.
arXiv Detail & Related papers (2022-12-20T15:40:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.