Position Information in Transformers: An Overview
- URL: http://arxiv.org/abs/2102.11090v1
- Date: Mon, 22 Feb 2021 15:03:23 GMT
- Title: Position Information in Transformers: An Overview
- Authors: Philipp Dufter, Martin Schmitt, Hinrich Sch\"utze
- Abstract summary: This paper provides an overview of common methods to incorporate position information into Transformer models.
The objectives of this survey are to showcase that position information in Transformer is a vibrant and extensive research area.
- Score: 6.284464997330884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers are arguably the main workhorse in recent Natural Language
Processing research. By definition a Transformer is invariant with respect to
reorderings of the input. However, language is inherently sequential and word
order is essential to the semantics and syntax of an utterance. In this paper,
we provide an overview of common methods to incorporate position information
into Transformer models. The objectives of this survey are to i) showcase that
position information in Transformer is a vibrant and extensive research area;
ii) enable the reader to compare existing methods by providing a unified
notation and meaningful clustering; iii) indicate what characteristics of an
application should be taken into account when selecting a position encoding;
iv) provide stimuli for future research.
Related papers
- Learning to Achieve Goals with Belief State Transformers [50.196123952714245]
"Belief State Transformer" is a next-token predictor that takes both a prefix and suffix as inputs.
Belief State Transformer effectively learns to solve challenging problems that conventional forward-only transformers struggle with.
arXiv Detail & Related papers (2024-10-30T23:26:06Z) - Survey: Transformer-based Models in Data Modality Conversion [0.8136541584281987]
Modality Conversion involves the transformation of data from one form of representation to another, mimicking the way humans integrate and interpret sensory information.
This paper provides a comprehensive review of transformer-based models applied to the primary modalities of text, vision, and speech, discussing their architectures, conversion methodologies, and applications.
arXiv Detail & Related papers (2024-08-08T18:39:14Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - An Introduction to Transformers [23.915718146956355]
transformer is a neural network component that can be used to learn useful sequences or sets of data-points.
In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture.
arXiv Detail & Related papers (2023-04-20T14:54:19Z) - Advances in Medical Image Analysis with Vision Transformers: A
Comprehensive Review [6.953789750981636]
We provide an encyclopedic review of the applications of Transformers in medical imaging.
Specifically, we present a systematic and thorough review of relevant recent Transformer literature for different medical image analysis tasks.
arXiv Detail & Related papers (2023-01-09T16:56:23Z) - A Length-Extrapolatable Transformer [98.54835576985664]
We focus on length extrapolation, i.e., training on short texts while evaluating longer sequences.
We introduce a relative position embedding to explicitly maximize attention resolution.
We evaluate different Transformer variants with language modeling.
arXiv Detail & Related papers (2022-12-20T18:56:20Z) - Transforming medical imaging with Transformers? A comparative review of
key properties, current progresses, and future perspectives [21.164122592628388]
Transformer, the latest technological advance of deep learning, has gained prevalence in natural language processing or computer vision.
We offer a comprehensive review of the state-of-the-art Transformer-based approaches for medical imaging.
arXiv Detail & Related papers (2022-06-02T16:38:31Z) - Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers.
The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z) - Relative Positional Encoding for Speech Recognition and Direct
Translation [72.64499573561922]
We adapt the relative position encoding scheme to the Speech Transformer.
As a result, the network can better adapt to the variable distributions present in speech data.
arXiv Detail & Related papers (2020-05-20T09:53:06Z) - Segatron: Segment-Aware Transformer for Language Modeling and
Understanding [79.84562707201323]
We propose a segment-aware Transformer (Segatron) to generate better contextual representations from sequential tokens.
We first introduce the segment-aware mechanism to Transformer-XL, which is a popular Transformer-based language model.
We find that our method can further improve the Transformer-XL base model and large model, achieving 17.1 perplexity on the WikiText-103 dataset.
arXiv Detail & Related papers (2020-04-30T17:38:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.