Contextual Position Encoding: Learning to Count What's Important
- URL: http://arxiv.org/abs/2405.18719v2
- Date: Thu, 30 May 2024 17:51:53 GMT
- Title: Contextual Position Encoding: Learning to Count What's Important
- Authors: Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar,
- Abstract summary: We propose a new position encoding method, Contextual Position Flop (CoPE)
CoPE allows positions to be conditioned on context by incrementing position on certain tokens determined by the model.
We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail.
- Score: 42.038277620194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but is order-invariant. Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token. However, current PE methods use token counts to derive position, and thus cannot generalize to higher levels of abstraction, such as attending to the i-th sentence. In this paper, we propose a new position encoding method, Contextual Position Encoding (CoPE), that allows positions to be conditioned on context by incrementing position only on certain tokens determined by the model. This allows more general position addressing such as attending to the $i$-th particular word, noun, or sentence. We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail, and improves perplexity on language modeling and coding tasks.
Related papers
- PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios.
We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition.
PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z) - Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation [69.68831888599476]
We develop a new positional encoding method called Bilevel Positional.
Ethicical analysis shows this disentanglement of positional information makes learning more effective.
Our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.
arXiv Detail & Related papers (2024-01-29T18:59:07Z) - The Locality and Symmetry of Positional Encodings [9.246374019271938]
We conduct a systematic study of positional encodings in textbfBi Masked Language Models (BERT-style)
We uncover the core function of PEs by identifying two common properties, Locality and Symmetry.
We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly.
arXiv Detail & Related papers (2023-10-19T16:15:15Z) - Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone.
We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z) - Relative Position Prediction as Pre-training for Text Encoders [0.0]
We argue that a position-centric perspective is more general and useful.
We adapt the relative position encoding paradigm in NLP to create relative labels for self-supervised learning.
arXiv Detail & Related papers (2022-02-02T17:13:31Z) - Learnable Fourier Features for Multi-DimensionalSpatial Positional
Encoding [96.9752763607738]
We propose a novel positional encoding method based on learnable Fourier features.
Our experiments show that our learnable feature representation for multi-dimensional positional encoding outperforms existing methods.
arXiv Detail & Related papers (2021-06-05T04:40:18Z) - Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations.
We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.