RoFormer: Enhanced Transformer with Rotary Position Embedding
- URL: http://arxiv.org/abs/2104.09864v5
- Date: Wed, 8 Nov 2023 13:36:32 GMT
- Title: RoFormer: Enhanced Transformer with Rotary Position Embedding
- Authors: Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu
- Abstract summary: We propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information.
RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation.
We evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets.
- Score: 9.01819510933327
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Position encoding recently has shown effective in the transformer
architecture. It enables valuable supervision for dependency modeling between
elements at different positions of the sequence. In this paper, we first
investigate various methods to integrate positional information into the
learning process of transformer-based language models. Then, we propose a novel
method named Rotary Position Embedding(RoPE) to effectively leverage the
positional information. Specifically, the proposed RoPE encodes the absolute
position with a rotation matrix and meanwhile incorporates the explicit
relative position dependency in self-attention formulation. Notably, RoPE
enables valuable properties, including the flexibility of sequence length,
decaying inter-token dependency with increasing relative distances, and the
capability of equipping the linear self-attention with relative position
encoding. Finally, we evaluate the enhanced transformer with rotary position
embedding, also called RoFormer, on various long text classification benchmark
datasets. Our experiments show that it consistently overcomes its alternatives.
Furthermore, we provide a theoretical analysis to explain some experimental
results. RoFormer is already integrated into Huggingface:
\url{https://huggingface.co/docs/transformers/model_doc/roformer}.
Related papers
- Geotokens and Geotransformers [0.0]
This paper presents geotokens, input components for transformers, each linked to a specific geological location.
Unlike typical language sequences, for these tokens, the order is not as vital as the geographical coordinates themselves.
arXiv Detail & Related papers (2024-03-23T22:02:56Z) - Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks.
We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z) - Spherical Position Encoding for Transformers [0.0]
We introduce the notion of "geotokens" which are input elements for transformer architectures.
Unlike the natural language the sequential position is not important for the model but the geographical coordinates are.
We formulate a position encoding mechanism based on RoPE architecture which is adjusted for spherical coordinates.
arXiv Detail & Related papers (2023-10-04T09:28:59Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - Rotation-Invariant Transformer for Point Cloud Matching [42.5714375149213]
We introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task.
We propose a global transformer with rotation-invariant cross-frame spatial awareness learned by the self-attention mechanism.
RoITr surpasses the existing methods by at least 13 and 5 percentage points in terms of Inlier Ratio and Registration Recall.
arXiv Detail & Related papers (2023-03-14T20:55:27Z) - A Length-Extrapolatable Transformer [98.54835576985664]
We focus on length extrapolation, i.e., training on short texts while evaluating longer sequences.
We introduce a relative position embedding to explicitly maximize attention resolution.
We evaluate different Transformer variants with language modeling.
arXiv Detail & Related papers (2022-12-20T18:56:20Z) - SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness
Enhancement [118.20816888815658]
We propose a novel deep architecture tailored for 3D point cloud applications, named as SPE-Net.
The embedded Selective Position variant' procedure relies on an attention mechanism that can effectively attend to the underlying rotation condition of the input.
We demonstrate the merits of the SPE-Net and the associated hypothesis on four benchmarks, showing evident improvements on both rotated and unrotated test data over SOTA methods.
arXiv Detail & Related papers (2022-11-15T15:59:09Z) - Multiplicative Position-aware Transformer Models for Language
Understanding [17.476450946279037]
Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks.
In this paper, we review major existing position embedding methods and compare their accuracy on downstream NLP tasks.
We also propose a novel multiplicative embedding method which leads to superior accuracy when compared to existing methods.
arXiv Detail & Related papers (2021-09-27T04:18:32Z) - Rethinking and Improving Relative Position Encoding for Vision
Transformer [61.559777439200744]
Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens.
We propose new relative position encoding methods dedicated to 2D images, called image RPE (iRPE)
arXiv Detail & Related papers (2021-07-29T17:55:10Z) - Conformer-based End-to-end Speech Recognition With Rotary Position
Embedding [11.428057887454008]
We introduce rotary position embedding (RoPE) in the convolution-augmented transformer (conformer)
RoPE encodes absolute positional information into the input sequence by a rotation matrix, and then naturally incorporates explicit relative position information into a self-attention module.
Our model achieves a relative word error rate reduction of 8.70% and 7.27% over the conformer on test-clean and test-other sets of the LibriSpeech corpus respectively.
arXiv Detail & Related papers (2021-07-13T08:07:22Z) - Stable, Fast and Accurate: Kernelized Attention with Relative Positional
Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE)
Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.