Related papers: ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities

ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities

URL: http://arxiv.org/abs/2509.19569v1
Date: Tue, 23 Sep 2025 20:51:51 GMT
Title: ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities
Authors: Aleksis Datseris, Sylvia Vassileva, Ivan Koychev, Svetla Boytcheva,
Abstract summary: This paper introduces a novel approach to position embeddings in transformer models, named "Exact Positional Embeddings" (ExPE)<n>Our proposed method utilizes a novel embedding strategy that encodes exact positional information by overriding specific dimensions of the embedding vectors.<n>In causal language modeling, our ExPE embeddings significantly reduce perplexity compared to rotary and sinusoidal embeddings, when tested on sequences longer than those used in training.
Score: 3.4293828076481496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a novel approach to position embeddings in transformer models, named "Exact Positional Embeddings" (ExPE). An absolute positional embedding method that can extrapolate to sequences of lengths longer than the ones it was trained on. Traditional transformer models rely on absolute or relative position embeddings to incorporate positional information into token embeddings, which often struggle with extrapolation to sequences longer than those seen during training. Our proposed method utilizes a novel embedding strategy that encodes exact positional information by overriding specific dimensions of the embedding vectors, thereby enabling a more precise representation of token positions. The proposed approach not only maintains the integrity of the original embeddings but also enhances the model's ability to generalize to more extended sequences. In causal language modeling, our ExPE embeddings significantly reduce perplexity compared to rotary and sinusoidal embeddings, when tested on sequences longer than those used in training.

Related papers

SeqPE: Transformer with Sequential Position Encoding [76.22159277300891]
SeqPE represents each $n$-dimensional position index as a symbolic sequence and employs a lightweight sequential position encoder to learn their embeddings.<n> Experiments across language modeling, long-context question answering, and 2D image classification demonstrate that SeqPE not only surpasses strong baselines in perplexity, exact match (EM) and accuracy--but also enables seamless generalization to multi-dimensional inputs without requiring manual architectural redesign.
arXiv Detail & Related papers (2025-06-16T09:16:40Z)
The Role of Sparsity for Length Generalization in Transformers [58.65997625433689]
We propose a new theoretical framework to study length generalization for the next-token prediction task.<n>We show that length generalization occurs as long as each predicted token depends on a small (fixed) number of previous tokens.<n>We introduce Predictive Position Coupling, which trains the transformer to predict the position IDs used in a positional coupling approach.
arXiv Detail & Related papers (2025-02-24T03:01:03Z)
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding [89.52931576290976]
We present contextbfTextualized equivaritextbfAnt textbfPosition textbfEncoding (textbfTAPE), a novel framework that enhances positional embeddings by incorporating sequence content across layers.<n>Our method can be easily integrated into pre-trained transformers, offering parameter-efficient fine-tuning with minimal overhead.
arXiv Detail & Related papers (2025-01-01T03:23:00Z)
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance. Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z)
A Frustratingly Easy Improvement for Position Embeddings via Random Padding [68.75670223005716]
In this paper, we propose a simple but effective strategy, Random Padding, without any modifications to existing pre-trained language models. Experiments show that Random Padding can significantly improve model performance on the instances whose answers are located at rear positions.
arXiv Detail & Related papers (2023-05-08T17:08:14Z)
Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis [72.71398034617607]
We dissect a relative positional embedding design, ALiBi, via the lens of receptive field analysis. We modify the vanilla Sinusoidal positional embedding to create bftext, the first parameter-free relative positional embedding design that truly length information uses longer than the training sequence.
arXiv Detail & Related papers (2022-12-20T15:40:17Z)
CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings [33.87449556591022]
We propose an augmentation-based approach (CAPE) for absolute positional embeddings. CAPE keeps the advantages of both absolute (simplicity and speed) and relative position embeddings (better generalization)
arXiv Detail & Related papers (2021-06-06T14:54:55Z)
Improve Transformer Models with Better Relative Position Embeddings [18.59434691153783]
Transformer architectures rely on explicit position encodings to preserve a notion of word order. We argue that existing work does not fully utilize position information. We propose new techniques that encourage increased interaction between query, key and relative position embeddings.
arXiv Detail & Related papers (2020-09-28T22:18:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.