The Locality and Symmetry of Positional Encodings
- URL: http://arxiv.org/abs/2310.12864v1
- Date: Thu, 19 Oct 2023 16:15:15 GMT
- Title: The Locality and Symmetry of Positional Encodings
- Authors: Lihu Chen, Ga\"el Varoquaux, Fabian M. Suchanek
- Abstract summary: We conduct a systematic study of positional encodings in textbfBi Masked Language Models (BERT-style)
We uncover the core function of PEs by identifying two common properties, Locality and Symmetry.
We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly.
- Score: 9.246374019271938
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Positional Encodings (PEs) are used to inject word-order information into
transformer-based language models. While they can significantly enhance the
quality of sentence representations, their specific contribution to language
models is not fully understood, especially given recent findings that various
positional encodings are insensitive to word order. In this work, we conduct a
systematic study of positional encodings in \textbf{Bidirectional Masked
Language Models} (BERT-style) , which complements existing work in three
aspects: (1) We uncover the core function of PEs by identifying two common
properties, Locality and Symmetry; (2) We show that the two properties are
closely correlated with the performances of downstream tasks; (3) We quantify
the weakness of current PEs by introducing two new probing tasks, on which
current PEs perform poorly. We believe that these results are the basis for
developing better PEs for transformer-based language models. The code is
available at \faGithub~ \url{https://github.com/tigerchen52/locality\_symmetry}
Related papers
- PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models [0.0]
Polynomial Based Positional gonal (PoPE) encodes positional information by Orthogonal Legendres.
We show that transformer models PoPE outperform baseline transformer models on the $Multi30k$ English-to-German translation task.
We will present novel theoretical perspectives on position encoding based on the superior performance of PoPE.
arXiv Detail & Related papers (2024-04-29T10:30:59Z) - Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation [69.68831888599476]
We develop a new positional encoding method called Bilevel Positional.
Ethicical analysis shows this disentanglement of positional information makes learning more effective.
Our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.
arXiv Detail & Related papers (2024-01-29T18:59:07Z) - CONFLATOR: Incorporating Switching Point based Rotatory Positional
Encodings for Code-Mixed Language Modeling [10.26356931263957]
We introduce CONFLATOR: a neural language modeling approach for code-mixed languages.
We show that rotatory positional encodings along with switching point information yield the best results.
ConFLATOR outperforms the state-of-the-art on two tasks based on code-mixed Hindi and English.
arXiv Detail & Related papers (2023-09-11T07:02:13Z) - Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone.
We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z) - The Impact of Positional Encodings on Multilingual Compression [3.454503173118508]
Several modifications have been proposed over the sinusoidal positional encodings used in the original transformer architecture.
We first show that surprisingly, while these modifications tend to improve monolingual language models, none of them result in better multilingual language models.
arXiv Detail & Related papers (2021-09-11T23:22:50Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations.
We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z) - Self-Attention with Cross-Lingual Position Representation [112.05807284056337]
Position encoding (PE) is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences.
Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem.
We augment SANs with emphcross-lingual position representations to model the bilingually aware latent structure for the input sentence.
arXiv Detail & Related papers (2020-04-28T05:23:43Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.