What Do Position Embeddings Learn? An Empirical Study of Pre-Trained
Language Model Positional Encoding
- URL: http://arxiv.org/abs/2010.04903v1
- Date: Sat, 10 Oct 2020 05:03:14 GMT
- Title: What Do Position Embeddings Learn? An Empirical Study of Pre-Trained
Language Model Positional Encoding
- Authors: Yu-An Wang, Yun-Nung Chen
- Abstract summary: This paper focuses on providing a new insight of pre-trained position embeddings through feature-level analysis and empirical experiments on most of iconic NLP tasks.
It is believed that our experimental results can guide the future work to choose the suitable positional encoding function for specific tasks given the application property.
- Score: 42.011175069706816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, pre-trained Transformers have dominated the majority of NLP
benchmark tasks. Many variants of pre-trained Transformers have kept breaking
out, and most focus on designing different pre-training objectives or variants
of self-attention. Embedding the position information in the self-attention
mechanism is also an indispensable factor in Transformers however is often
discussed at will. Therefore, this paper carries out an empirical study on
position embeddings of mainstream pre-trained Transformers, which mainly
focuses on two questions: 1) Do position embeddings really learn the meaning of
positions? 2) How do these different learned position embeddings affect
Transformers for NLP tasks? This paper focuses on providing a new insight of
pre-trained position embeddings through feature-level analysis and empirical
experiments on most of iconic NLP tasks. It is believed that our experimental
results can guide the future work to choose the suitable positional encoding
function for specific tasks given the application property.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task [40.85615657802704]
This paper introduces the contextual counting task, a novel toy problem aimed at enhancing our understanding of Transformers.
We present theoretical and empirical analysis using both causal and non-causal Transformer architectures.
arXiv Detail & Related papers (2024-05-30T20:52:23Z) - Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs [18.832135309689736]
Recent advances in large language models (LLMs) have enhanced their ability to process long input contexts.
Recent studies show a positional bias in LLMs, demonstrating varying performance depending on the location of useful information.
We develop a Position-Aware PAPEFT approach which is composed of a data augmentation technique and an efficient parameter adapter.
arXiv Detail & Related papers (2024-04-01T19:04:17Z) - In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent.
For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - Evaluating Prompt-based Question Answering for Object Prediction in the
Open Research Knowledge Graph [0.0]
This work reports results on adopting prompt-based training of transformers for textitscholarly knowledge graph object prediction
It deviates from the other works proposing entity and relation extraction pipelines for predicting objects of a scholarly knowledge graph.
We find that (i) per expectations, transformer models when tested out-of-the-box underperform on a new domain of data, (ii) prompt-based training of the models achieve performance boosts of up to 40% in a relaxed evaluation setting.
arXiv Detail & Related papers (2023-05-22T10:35:18Z) - Position Prediction as an Effective Pretraining Strategy [20.925906203643883]
We propose a novel, but surprisingly simple alternative to content reconstruction-- that of predicting locations from content, without providing positional information for it.
Our approach brings improvements over strong supervised training baselines and is comparable to modern unsupervised/self-supervised pretraining methods.
arXiv Detail & Related papers (2022-07-15T17:10:48Z) - Paragraph-based Transformer Pre-training for Multi-Sentence Inference [99.59693674455582]
We show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks.
We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences.
arXiv Detail & Related papers (2022-05-02T21:41:14Z) - Training ELECTRA Augmented with Multi-word Selection [53.77046731238381]
We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.
Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
arXiv Detail & Related papers (2021-05-31T23:19:00Z) - Position Information in Transformers: An Overview [6.284464997330884]
This paper provides an overview of common methods to incorporate position information into Transformer models.
The objectives of this survey are to showcase that position information in Transformer is a vibrant and extensive research area.
arXiv Detail & Related papers (2021-02-22T15:03:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.