Position Information in Transformers: An Overview
- URL: http://arxiv.org/abs/2102.11090v1
- Date: Mon, 22 Feb 2021 15:03:23 GMT
- Title: Position Information in Transformers: An Overview
- Authors: Philipp Dufter, Martin Schmitt, Hinrich Sch\"utze
- Abstract summary: This paper provides an overview of common methods to incorporate position information into Transformer models.
The objectives of this survey are to showcase that position information in Transformer is a vibrant and extensive research area.
- Score: 6.284464997330884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers are arguably the main workhorse in recent Natural Language
Processing research. By definition a Transformer is invariant with respect to
reorderings of the input. However, language is inherently sequential and word
order is essential to the semantics and syntax of an utterance. In this paper,
we provide an overview of common methods to incorporate position information
into Transformer models. The objectives of this survey are to i) showcase that
position information in Transformer is a vibrant and extensive research area;
ii) enable the reader to compare existing methods by providing a unified
notation and meaningful clustering; iii) indicate what characteristics of an
application should be taken into account when selecting a position encoding;
iv) provide stimuli for future research.
Related papers
- Sparse Autoencoders Can Interpret Randomly Initialized Transformers [21.142967037533175]
Sparse autoencoders (SAEs) are an increasingly popular technique for interpreting the internal representations of transformers.
We apply SAEs to 'interpret' random transformers, i.e., transformers where the parameters are sampled IID from a Gaussian rather than trained on text data.
We find that random and trained transformers produce similarly interpretable SAE latents, and we confirm this finding quantitatively using an open-source auto-interpretability pipeline.
arXiv Detail & Related papers (2025-01-29T16:11:12Z) - Enhancing Transformers for Generalizable First-Order Logical Entailment [51.04944136538266]
This paper investigates the generalizable first-order logical reasoning ability of transformers with their parameterized knowledge.
The first-order reasoning capability of transformers is assessed through their ability to perform first-order logical entailment.
We propose a more sophisticated, logic-aware architecture, TEGA, to enhance the capability for generalizable first-order logical entailment in transformers.
arXiv Detail & Related papers (2025-01-01T07:05:32Z) - The Belief State Transformer [50.196123952714245]
"Belief State Transformer" is a next-token predictor that takes both a prefix and suffix as inputs.
It effectively learns to solve challenging problems that conventional forward-only transformers struggle with.
arXiv Detail & Related papers (2024-10-30T23:26:06Z) - Survey: Transformer-based Models in Data Modality Conversion [0.8136541584281987]
Modality Conversion involves the transformation of data from one form of representation to another, mimicking the way humans integrate and interpret sensory information.
This paper provides a comprehensive review of transformer-based models applied to the primary modalities of text, vision, and speech, discussing their architectures, conversion methodologies, and applications.
arXiv Detail & Related papers (2024-08-08T18:39:14Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - Transforming medical imaging with Transformers? A comparative review of
key properties, current progresses, and future perspectives [21.164122592628388]
Transformer, the latest technological advance of deep learning, has gained prevalence in natural language processing or computer vision.
We offer a comprehensive review of the state-of-the-art Transformer-based approaches for medical imaging.
arXiv Detail & Related papers (2022-06-02T16:38:31Z) - Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers.
The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z) - Relative Positional Encoding for Speech Recognition and Direct
Translation [72.64499573561922]
We adapt the relative position encoding scheme to the Speech Transformer.
As a result, the network can better adapt to the variable distributions present in speech data.
arXiv Detail & Related papers (2020-05-20T09:53:06Z) - Segatron: Segment-Aware Transformer for Language Modeling and
Understanding [79.84562707201323]
We propose a segment-aware Transformer (Segatron) to generate better contextual representations from sequential tokens.
We first introduce the segment-aware mechanism to Transformer-XL, which is a popular Transformer-based language model.
We find that our method can further improve the Transformer-XL base model and large model, achieving 17.1 perplexity on the WikiText-103 dataset.
arXiv Detail & Related papers (2020-04-30T17:38:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.