Related papers: RoFormer: Enhanced Transformer with Rotary Position Embedding

RoFormer: Enhanced Transformer with Rotary Position Embedding

URL: http://arxiv.org/abs/2104.09864v5
Date: Wed, 8 Nov 2023 13:36:32 GMT
Title: RoFormer: Enhanced Transformer with Rotary Position Embedding
Authors: Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu
Abstract summary: We propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. We evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets.
Score: 9.01819510933327
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. Notably, RoPE enables valuable properties, including the flexibility of sequence length, decaying inter-token dependency with increasing relative distances, and the capability of equipping the linear self-attention with relative position encoding. Finally, we evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets. Our experiments show that it consistently overcomes its alternatives. Furthermore, we provide a theoretical analysis to explain some experimental results. RoFormer is already integrated into Huggingface: \url{https://huggingface.co/docs/transformers/model_doc/roformer}.

Related papers

DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling [9.86959003425198]
Directional Rotary Position Embedding (DRoPE) is a novel adaptation of Rotary Position Embedding (RoPE) originally developed in natural language processing. DRoPE overcomes limitations by introducing a uniform identity scalar into RoPE's 2D rotary transformation. Empirical evaluations confirm DRoPE's good performance and significantly reduced space complexity.
arXiv Detail & Related papers (2025-03-19T09:23:09Z)
Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes. In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z)
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition [17.360059094663182]
Rotary Position Embedding (RoPE) encodes relative and absolute positional information in Transformer-based models. RoPE consistently achieves lower error rates compared to the currently widely used relative positional embedding. To facilitate further research, we release the implementation and all experimental recipes through the SpeechBrain toolkit.
arXiv Detail & Related papers (2025-01-10T15:30:46Z)
Geotokens and Geotransformers [0.0]
This paper presents geotokens, input components for transformers, each linked to a specific geological location. Unlike typical language sequences, for these tokens, the order is not as vital as the geographical coordinates themselves.
arXiv Detail & Related papers (2024-03-23T22:02:56Z)
Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks. We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers. By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z)
Spherical Position Encoding for Transformers [0.0]
We introduce the notion of "geotokens" which are input elements for transformer architectures. Unlike the natural language the sequential position is not important for the model but the geographical coordinates are. We formulate a position encoding mechanism based on RoPE architecture which is adjusted for spherical coordinates.
arXiv Detail & Related papers (2023-10-04T09:28:59Z)
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance. Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z)
Rotation-Invariant Transformer for Point Cloud Matching [42.5714375149213]
We introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task. We propose a global transformer with rotation-invariant cross-frame spatial awareness learned by the self-attention mechanism. RoITr surpasses the existing methods by at least 13 and 5 percentage points in terms of Inlier Ratio and Registration Recall.
arXiv Detail & Related papers (2023-03-14T20:55:27Z)
A Length-Extrapolatable Transformer [98.54835576985664]
We focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We introduce a relative position embedding to explicitly maximize attention resolution. We evaluate different Transformer variants with language modeling.
arXiv Detail & Related papers (2022-12-20T18:56:20Z)
SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement [118.20816888815658]
We propose a novel deep architecture tailored for 3D point cloud applications, named as SPE-Net. The embedded Selective Position variant' procedure relies on an attention mechanism that can effectively attend to the underlying rotation condition of the input. We demonstrate the merits of the SPE-Net and the associated hypothesis on four benchmarks, showing evident improvements on both rotated and unrotated test data over SOTA methods.
arXiv Detail & Related papers (2022-11-15T15:59:09Z)
Multiplicative Position-aware Transformer Models for Language Understanding [17.476450946279037]
Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks. In this paper, we review major existing position embedding methods and compare their accuracy on downstream NLP tasks. We also propose a novel multiplicative embedding method which leads to superior accuracy when compared to existing methods.
arXiv Detail & Related papers (2021-09-27T04:18:32Z)
Rethinking and Improving Relative Position Encoding for Vision Transformer [61.559777439200744]
Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens. We propose new relative position encoding methods dedicated to 2D images, called image RPE (iRPE)
arXiv Detail & Related papers (2021-07-29T17:55:10Z)
Conformer-based End-to-end Speech Recognition With Rotary Position Embedding [11.428057887454008]
We introduce rotary position embedding (RoPE) in the convolution-augmented transformer (conformer) RoPE encodes absolute positional information into the input sequence by a rotation matrix, and then naturally incorporates explicit relative position information into a self-attention module. Our model achieves a relative word error rate reduction of 8.70% and 7.27% over the conformer on test-clean and test-other sets of the LibriSpeech corpus respectively.
arXiv Detail & Related papers (2021-07-13T08:07:22Z)
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE) Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.