Related papers: The Locality and Symmetry of Positional Encodings

The Locality and Symmetry of Positional Encodings

URL: http://arxiv.org/abs/2310.12864v1
Date: Thu, 19 Oct 2023 16:15:15 GMT
Title: The Locality and Symmetry of Positional Encodings
Authors: Lihu Chen, Ga\"el Varoquaux, Fabian M. Suchanek
Abstract summary: We conduct a systematic study of positional encodings in textbfBi Masked Language Models (BERT-style) We uncover the core function of PEs by identifying two common properties, Locality and Symmetry. We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly.
Score: 9.246374019271938
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Positional Encodings (PEs) are used to inject word-order information into transformer-based language models. While they can significantly enhance the quality of sentence representations, their specific contribution to language models is not fully understood, especially given recent findings that various positional encodings are insensitive to word order. In this work, we conduct a systematic study of positional encodings in \textbf{Bidirectional Masked Language Models} (BERT-style) , which complements existing work in three aspects: (1) We uncover the core function of PEs by identifying two common properties, Locality and Symmetry; (2) We show that the two properties are closely correlated with the performances of downstream tasks; (3) We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly. We believe that these results are the basis for developing better PEs for transformer-based language models. The code is available at \faGithub~ \url{https://github.com/tigerchen52/locality\_symmetry}

Related papers

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding [89.52931576290976]
Transformers rely on both content-based and position-based addressing mechanisms to make predictions. TAPE is a novel framework that enhances positional embeddings by incorporating sequence content across layers. Our method can be easily integrated into pre-trained transformers, offering parameter-efficient fine-tuning with minimal overhead.
arXiv Detail & Related papers (2025-01-01T03:23:00Z)
PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models [0.0]
Polynomial Based Positional gonal (PoPE) encodes positional information by Orthogonal Legendres. We show that transformer models PoPE outperform baseline transformer models on the $Multi30k$ English-to-German translation task. We will present novel theoretical perspectives on position encoding based on the superior performance of PoPE.
arXiv Detail & Related papers (2024-04-29T10:30:59Z)
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation [69.68831888599476]
We develop a new positional encoding method called Bilevel Positional. Ethicical analysis shows this disentanglement of positional information makes learning more effective. Our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.
arXiv Detail & Related papers (2024-01-29T18:59:07Z)
CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling [10.26356931263957]
We introduce CONFLATOR: a neural language modeling approach for code-mixed languages. We show that rotatory positional encodings along with switching point information yield the best results. ConFLATOR outperforms the state-of-the-art on two tasks based on code-mixed Hindi and English.
arXiv Detail & Related papers (2023-09-11T07:02:13Z)
Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone. We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z)
The Impact of Positional Encodings on Multilingual Compression [3.454503173118508]
Several modifications have been proposed over the sinusoidal positional encodings used in the original transformer architecture. We first show that surprisingly, while these modifications tend to improve monolingual language models, none of them result in better multilingual language models.
arXiv Detail & Related papers (2021-09-11T23:22:50Z)
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z)
Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations. We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z)
Self-Attention with Cross-Lingual Position Representation [112.05807284056337]
Position encoding (PE) is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences. Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem. We augment SANs with emphcross-lingual position representations to model the bilingually aware latent structure for the input sentence.
arXiv Detail & Related papers (2020-04-28T05:23:43Z)
Probing Linguistic Features of Sentence-Level Representations in Neural Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE) We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets. We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.