The Locality and Symmetry of Positional Encodings
        - URL: http://arxiv.org/abs/2310.12864v1
- Date: Thu, 19 Oct 2023 16:15:15 GMT
- Title: The Locality and Symmetry of Positional Encodings
- Authors: Lihu Chen, Ga\"el Varoquaux, Fabian M. Suchanek
- Abstract summary: We conduct a systematic study of positional encodings in textbfBi Masked Language Models (BERT-style)
We uncover the core function of PEs by identifying two common properties, Locality and Symmetry.
We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly.
- Score: 9.246374019271938
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract:   Positional Encodings (PEs) are used to inject word-order information into
transformer-based language models. While they can significantly enhance the
quality of sentence representations, their specific contribution to language
models is not fully understood, especially given recent findings that various
positional encodings are insensitive to word order. In this work, we conduct a
systematic study of positional encodings in \textbf{Bidirectional Masked
Language Models} (BERT-style) , which complements existing work in three
aspects: (1) We uncover the core function of PEs by identifying two common
properties, Locality and Symmetry; (2) We show that the two properties are
closely correlated with the performances of downstream tasks; (3) We quantify
the weakness of current PEs by introducing two new probing tasks, on which
current PEs perform poorly. We believe that these results are the basis for
developing better PEs for transformer-based language models. The code is
available at \faGithub~ \url{https://github.com/tigerchen52/locality\_symmetry}
 
      
        Related papers
        - Entropy-Driven Pre-Tokenization for Byte-Pair Encoding [4.145560327709288]
 Two entropy-informed pre-tokenization strategies guide BPE segmentation using unsupervised information-theoretic cues.<n>We evaluate both methods on a subset of the PKU dataset and demonstrate substantial improvements in segmentation precision, recall, and F1 score compared to standard BPE.
 arXiv  Detail & Related papers  (2025-06-18T21:25:55Z)
- SeqPE: Transformer with Sequential Position Encoding [76.22159277300891]
 SeqPE represents each $n$-dimensional position index as a symbolic sequence and employs a lightweight sequential position encoder to learn their embeddings.<n> Experiments across language modeling, long-context question answering, and 2D image classification demonstrate that SeqPE not only surpasses strong baselines in perplexity, exact match (EM) and accuracy--but also enables seamless generalization to multi-dimensional inputs without requiring manual architectural redesign.
 arXiv  Detail & Related papers  (2025-06-16T09:16:40Z)
- PaTH Attention: Position Encoding via Accumulating Householder   Transformations [56.32365080761523]
 PaTH is a flexible data-dependent position encoding scheme based on accumulated products of Householder transformations.<n>We derive an efficient parallel algorithm for training through exploiting a compact representation of products of Householder matrices.
 arXiv  Detail & Related papers  (2025-05-22T08:36:09Z)
- Rethinking Addressing in Language Models via Contexualized Equivariant   Positional Encoding [89.52931576290976]
 Transformers rely on both content-based and position-based addressing mechanisms to make predictions.
TAPE is a novel framework that enhances positional embeddings by incorporating sequence content across layers.
Our method can be easily integrated into pre-trained transformers, offering parameter-efficient fine-tuning with minimal overhead.
 arXiv  Detail & Related papers  (2025-01-01T03:23:00Z)
- Learning interpretable positional encodings in transformers depends on   initialization [14.732076081683418]
 positional encoding (PE) provides essential information that distinguishes the position and order amongst tokens in a sequence.<n>We show that the choice of a learnable PE greatly influences its ability to learn interpretable PEs.<n>We find that a learned PE from a small-norm distribution can uncover interpretable PEs that mirror ground truth positions in multiple dimensions.
 arXiv  Detail & Related papers  (2024-06-12T14:37:29Z)
- PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large   Language Models [0.0]
 Polynomial Based Positional gonal (PoPE) encodes positional information by Orthogonal Legendres.
We show that transformer models PoPE outperform baseline transformer models on the $Multi30k$ English-to-German translation task.
We will present novel theoretical perspectives on position encoding based on the superior performance of PoPE.
 arXiv  Detail & Related papers  (2024-04-29T10:30:59Z)
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length   Extrapolation [69.68831888599476]
 We develop a new positional encoding method called Bilevel Positional.
Ethicical analysis shows this disentanglement of positional information makes learning more effective.
Our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.
 arXiv  Detail & Related papers  (2024-01-29T18:59:07Z)
- CONFLATOR: Incorporating Switching Point based Rotatory Positional
  Encodings for Code-Mixed Language Modeling [10.26356931263957]
 We introduce CONFLATOR: a neural language modeling approach for code-mixed languages.
We show that rotatory positional encodings along with switching point information yield the best results.
ConFLATOR outperforms the state-of-the-art on two tasks based on code-mixed Hindi and English.
 arXiv  Detail & Related papers  (2023-09-11T07:02:13Z)
- Word Order Matters when you Increase Masking [70.29624135819884]
 We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone.
We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
 arXiv  Detail & Related papers  (2022-11-08T18:14:04Z)
- The Impact of Positional Encodings on Multilingual Compression [3.454503173118508]
 Several modifications have been proposed over the sinusoidal positional encodings used in the original transformer architecture.
We first show that surprisingly, while these modifications tend to improve monolingual language models, none of them result in better multilingual language models.
 arXiv  Detail & Related papers  (2021-09-11T23:22:50Z)
- VECO: Variable and Flexible Cross-lingual Pre-training for Language
  Understanding and Generation [77.82373082024934]
 We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
 arXiv  Detail & Related papers  (2020-10-30T03:41:38Z)
- Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
 We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations.
We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
 arXiv  Detail & Related papers  (2020-06-28T13:11:02Z)
- Self-Attention with Cross-Lingual Position Representation [112.05807284056337]
 Position encoding (PE) is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences.
Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem.
We augment SANs with emphcross-lingual position representations to model the bilingually aware latent structure for the input sentence.
 arXiv  Detail & Related papers  (2020-04-28T05:23:43Z)
- Probing Linguistic Features of Sentence-Level Representations in Neural
  Relation Extraction [80.38130122127882]
 We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
 arXiv  Detail & Related papers  (2020-04-17T09:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.