Related papers: Position Information in Transformers: An Overview

Position Information in Transformers: An Overview

URL: http://arxiv.org/abs/2102.11090v1
Date: Mon, 22 Feb 2021 15:03:23 GMT
Title: Position Information in Transformers: An Overview
Authors: Philipp Dufter, Martin Schmitt, Hinrich Sch\"utze
Abstract summary: This paper provides an overview of common methods to incorporate position information into Transformer models. The objectives of this survey are to showcase that position information in Transformer is a vibrant and extensive research area.
Score: 6.284464997330884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to reorderings of the input. However, language is inherently sequential and word order is essential to the semantics and syntax of an utterance. In this paper, we provide an overview of common methods to incorporate position information into Transformer models. The objectives of this survey are to i) showcase that position information in Transformer is a vibrant and extensive research area; ii) enable the reader to compare existing methods by providing a unified notation and meaningful clustering; iii) indicate what characteristics of an application should be taken into account when selecting a position encoding; iv) provide stimuli for future research.

Related papers

Sparse Autoencoders Can Interpret Randomly Initialized Transformers [21.142967037533175]
Sparse autoencoders (SAEs) are an increasingly popular technique for interpreting the internal representations of transformers. We apply SAEs to 'interpret' random transformers, i.e., transformers where the parameters are sampled IID from a Gaussian rather than trained on text data. We find that random and trained transformers produce similarly interpretable SAE latents, and we confirm this finding quantitatively using an open-source auto-interpretability pipeline.
arXiv Detail & Related papers (2025-01-29T16:11:12Z)
Enhancing Transformers for Generalizable First-Order Logical Entailment [51.04944136538266]
This paper investigates the generalizable first-order logical reasoning ability of transformers with their parameterized knowledge. The first-order reasoning capability of transformers is assessed through their ability to perform first-order logical entailment. We propose a more sophisticated, logic-aware architecture, TEGA, to enhance the capability for generalizable first-order logical entailment in transformers.
arXiv Detail & Related papers (2025-01-01T07:05:32Z)
Learning to Achieve Goals with Belief State Transformers [50.196123952714245]
"Belief State Transformer" is a next-token predictor that takes both a prefix and suffix as inputs. Belief State Transformer effectively learns to solve challenging problems that conventional forward-only transformers struggle with.
arXiv Detail & Related papers (2024-10-30T23:26:06Z)
Survey: Transformer-based Models in Data Modality Conversion [0.8136541584281987]
Modality Conversion involves the transformation of data from one form of representation to another, mimicking the way humans integrate and interpret sensory information. This paper provides a comprehensive review of transformer-based models applied to the primary modalities of text, vision, and speech, discussing their architectures, conversion methodologies, and applications.
arXiv Detail & Related papers (2024-08-08T18:39:14Z)
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance. Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z)
An Introduction to Transformers [23.915718146956355]
transformer is a neural network component that can be used to learn useful sequences or sets of data-points. In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture.
arXiv Detail & Related papers (2023-04-20T14:54:19Z)
Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review [6.953789750981636]
We provide an encyclopedic review of the applications of Transformers in medical imaging. Specifically, we present a systematic and thorough review of relevant recent Transformer literature for different medical image analysis tasks.
arXiv Detail & Related papers (2023-01-09T16:56:23Z)
A Length-Extrapolatable Transformer [98.54835576985664]
We focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We introduce a relative position embedding to explicitly maximize attention resolution. We evaluate different Transformer variants with language modeling.
arXiv Detail & Related papers (2022-12-20T18:56:20Z)
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives [21.164122592628388]
Transformer, the latest technological advance of deep learning, has gained prevalence in natural language processing or computer vision. We offer a comprehensive review of the state-of-the-art Transformer-based approaches for medical imaging.
arXiv Detail & Related papers (2022-06-02T16:38:31Z)
Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z)
Relative Positional Encoding for Speech Recognition and Direct Translation [72.64499573561922]
We adapt the relative position encoding scheme to the Speech Transformer. As a result, the network can better adapt to the variable distributions present in speech data.
arXiv Detail & Related papers (2020-05-20T09:53:06Z)
Segatron: Segment-Aware Transformer for Language Modeling and Understanding [79.84562707201323]
We propose a segment-aware Transformer (Segatron) to generate better contextual representations from sequential tokens. We first introduce the segment-aware mechanism to Transformer-XL, which is a popular Transformer-based language model. We find that our method can further improve the Transformer-XL base model and large model, achieving 17.1 perplexity on the WikiText-103 dataset.
arXiv Detail & Related papers (2020-04-30T17:38:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.