RoFormer: Enhanced Transformer with Rotary Position Embedding
- URL: http://arxiv.org/abs/2104.09864v5
- Date: Wed, 8 Nov 2023 13:36:32 GMT
- Title: RoFormer: Enhanced Transformer with Rotary Position Embedding
- Authors: Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu
- Abstract summary: We propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information.
RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation.
We evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets.
- Score: 9.01819510933327
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Position encoding recently has shown effective in the transformer
architecture. It enables valuable supervision for dependency modeling between
elements at different positions of the sequence. In this paper, we first
investigate various methods to integrate positional information into the
learning process of transformer-based language models. Then, we propose a novel
method named Rotary Position Embedding(RoPE) to effectively leverage the
positional information. Specifically, the proposed RoPE encodes the absolute
position with a rotation matrix and meanwhile incorporates the explicit
relative position dependency in self-attention formulation. Notably, RoPE
enables valuable properties, including the flexibility of sequence length,
decaying inter-token dependency with increasing relative distances, and the
capability of equipping the linear self-attention with relative position
encoding. Finally, we evaluate the enhanced transformer with rotary position
embedding, also called RoFormer, on various long text classification benchmark
datasets. Our experiments show that it consistently overcomes its alternatives.
Furthermore, we provide a theoretical analysis to explain some experimental
results. RoFormer is already integrated into Huggingface:
\url{https://huggingface.co/docs/transformers/model_doc/roformer}.
Related papers
- Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes.
In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z) - Benchmarking Rotary Position Embeddings for Automatic Speech Recognition [17.360059094663182]
Rotary Position Embedding (RoPE) encodes relative and absolute positional information in Transformer-based models.
RoPE consistently achieves lower error rates compared to the currently widely used relative positional embedding.
To facilitate further research, we release the implementation and all experimental recipes through the SpeechBrain toolkit.
arXiv Detail & Related papers (2025-01-10T15:30:46Z) - Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks.
We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z) - Spherical Position Encoding for Transformers [0.0]
We introduce the notion of "geotokens" which are input elements for transformer architectures.
Unlike the natural language the sequential position is not important for the model but the geographical coordinates are.
We formulate a position encoding mechanism based on RoPE architecture which is adjusted for spherical coordinates.
arXiv Detail & Related papers (2023-10-04T09:28:59Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - Rotation-Invariant Transformer for Point Cloud Matching [42.5714375149213]
We introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task.
We propose a global transformer with rotation-invariant cross-frame spatial awareness learned by the self-attention mechanism.
RoITr surpasses the existing methods by at least 13 and 5 percentage points in terms of Inlier Ratio and Registration Recall.
arXiv Detail & Related papers (2023-03-14T20:55:27Z) - A Length-Extrapolatable Transformer [98.54835576985664]
We focus on length extrapolation, i.e., training on short texts while evaluating longer sequences.
We introduce a relative position embedding to explicitly maximize attention resolution.
We evaluate different Transformer variants with language modeling.
arXiv Detail & Related papers (2022-12-20T18:56:20Z) - Multiplicative Position-aware Transformer Models for Language
Understanding [17.476450946279037]
Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks.
In this paper, we review major existing position embedding methods and compare their accuracy on downstream NLP tasks.
We also propose a novel multiplicative embedding method which leads to superior accuracy when compared to existing methods.
arXiv Detail & Related papers (2021-09-27T04:18:32Z) - Rethinking and Improving Relative Position Encoding for Vision
Transformer [61.559777439200744]
Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens.
We propose new relative position encoding methods dedicated to 2D images, called image RPE (iRPE)
arXiv Detail & Related papers (2021-07-29T17:55:10Z) - Conformer-based End-to-end Speech Recognition With Rotary Position
Embedding [11.428057887454008]
We introduce rotary position embedding (RoPE) in the convolution-augmented transformer (conformer)
RoPE encodes absolute positional information into the input sequence by a rotation matrix, and then naturally incorporates explicit relative position information into a self-attention module.
Our model achieves a relative word error rate reduction of 8.70% and 7.27% over the conformer on test-clean and test-other sets of the LibriSpeech corpus respectively.
arXiv Detail & Related papers (2021-07-13T08:07:22Z) - Stable, Fast and Accurate: Kernelized Attention with Relative Positional
Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE)
Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.