Related papers: Contextual Position Encoding: Learning to Count What's Important

Contextual Position Encoding: Learning to Count What's Important

URL: http://arxiv.org/abs/2405.18719v2
Date: Thu, 30 May 2024 17:51:53 GMT
Title: Contextual Position Encoding: Learning to Count What's Important
Authors: Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar,
Abstract summary: We propose a new position encoding method, Contextual Position Flop (CoPE) CoPE allows positions to be conditioned on context by incrementing position on certain tokens determined by the model. We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail.
Score: 42.038277620194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but is order-invariant. Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token. However, current PE methods use token counts to derive position, and thus cannot generalize to higher levels of abstraction, such as attending to the i-th sentence. In this paper, we propose a new position encoding method, Contextual Position Encoding (CoPE), that allows positions to be conditioned on context by incrementing position only on certain tokens determined by the model. This allows more general position addressing such as attending to the $i$-th particular word, noun, or sentence. We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail, and improves perplexity on language modeling and coding tasks.

Related papers

SeqPE: Transformer with Sequential Position Encoding [76.22159277300891]
SeqPE represents each $n$-dimensional position index as a symbolic sequence and employs a lightweight sequential position encoder to learn their embeddings.<n> Experiments across language modeling, long-context question answering, and 2D image classification demonstrate that SeqPE not only surpasses strong baselines in perplexity, exact match (EM) and accuracy--but also enables seamless generalization to multi-dimensional inputs without requiring manual architectural redesign.
arXiv Detail & Related papers (2025-06-16T09:16:40Z)
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding [89.52931576290976]
Transformers rely on both content-based and position-based addressing mechanisms to make predictions. TAPE is a novel framework that enhances positional embeddings by incorporating sequence content across layers. Our method can be easily integrated into pre-trained transformers, offering parameter-efficient fine-tuning with minimal overhead.
arXiv Detail & Related papers (2025-01-01T03:23:00Z)
Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning [20.801571525710834]
Token Internal Position Awareness (TIPA) is a method that significantly improves models' ability to capture character positions within tokens. TIPA enhances position prediction accuracy in large language models, enabling more precise identification of target characters in original text.
arXiv Detail & Related papers (2024-11-26T18:44:39Z)
Position IDs Matter: An Enhanced Position Layout for Efficient Context Compression in Large Language Models [50.637714223178456]
We propose Enhanced Position Layout (EPL) to improve the context compression capability of large language models (LLMs)<n>EPL minimizes the distance between context tokens and their corresponding special tokens and at the same time maintains the sequence order in position IDs.<n>When extended to multimodal scenarios, EPL brings an average accuracy gain of 2.6 to vision compression LLMs.
arXiv Detail & Related papers (2024-09-22T08:51:18Z)
PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios. We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition. PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z)
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation [69.68831888599476]
We develop a new positional encoding method called Bilevel Positional. Ethicical analysis shows this disentanglement of positional information makes learning more effective. Our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.
arXiv Detail & Related papers (2024-01-29T18:59:07Z)
The Locality and Symmetry of Positional Encodings [9.246374019271938]
We conduct a systematic study of positional encodings in textbfBi Masked Language Models (BERT-style) We uncover the core function of PEs by identifying two common properties, Locality and Symmetry. We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly.
arXiv Detail & Related papers (2023-10-19T16:15:15Z)
Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone. We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z)
Relative Position Prediction as Pre-training for Text Encoders [0.0]
We argue that a position-centric perspective is more general and useful. We adapt the relative position encoding paradigm in NLP to create relative labels for self-supervised learning.
arXiv Detail & Related papers (2022-02-02T17:13:31Z)
Learnable Fourier Features for Multi-DimensionalSpatial Positional Encoding [96.9752763607738]
We propose a novel positional encoding method based on learnable Fourier features. Our experiments show that our learnable feature representation for multi-dimensional positional encoding outperforms existing methods.
arXiv Detail & Related papers (2021-06-05T04:40:18Z)
Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations. We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.