Related papers: Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform

Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform

URL: http://arxiv.org/abs/2505.11892v1
Date: Sat, 17 May 2025 08:03:50 GMT
Title: Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Authors: Josh Alman, Zhao Song,
Abstract summary: A main bottleneck in the time to perform transformer computations is a task called attention computation.<n>We show how to overcome this issue, and give a new algorithm to compute the RoPE attention in almost linear time.
Score: 10.88046646153971
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The transformer architecture has been widely applied to many machine learning tasks. A main bottleneck in the time to perform transformer computations is a task called attention computation. [Alman and Song, NeurIPS 2023] have shown that in the bounded entry regime, there is an almost linear time algorithm to approximate the attention computation. They also proved that the bounded entry assumption is necessary for a fast algorithm assuming the popular Strong Exponential Time Hypothesis. A new version of transformer which uses position embeddings has recently been very successful. At a high level, position embedding enables the model to capture the correlations between tokens while taking into account their position in the sequence. Perhaps the most popular and effective version is Rotary Position Embedding (RoPE), which was proposed by [Su, Lu, Pan, Murtadha, Wen, and Liu, Neurocomputing 2024]. A main downside of RoPE is that it complicates the attention computation problem, so that previous techniques for designing almost linear time algorithms no longer seem to work. In this paper, we show how to overcome this issue, and give a new algorithm to compute the RoPE attention in almost linear time in the bounded entry regime. (Again, known lower bounds imply that bounded entries are necessary.) Our new algorithm combines two techniques in a novel way: the polynomial method, which was used in prior fast attention algorithms, and the Fast Fourier Transform.

Related papers

Fast Gradient Computation for RoPE Attention in Almost Linear Time [27.28314860714307]
We develop the first almost linear time algorithm for backward computations in RoPE-based attention under bounded entries.<n>Our approach builds on recent advancements in fast RoPE attention computations.
arXiv Detail & Related papers (2024-12-23T06:20:22Z)
Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search [0.0]
We present a tensor graph rewriting approach that uses Monte Carlo tree search to build superior representation. Our approach improves the inference speedup of neural networks by up to 11% compared to existing methods.
arXiv Detail & Related papers (2024-10-07T22:22:02Z)
Fundamental Limitations on Subquadratic Alternatives to Transformers [3.514389461266844]
We focus on document similarity tasks, where one is given as input many documents and would like to find a pair which is approximately the most similar. We prove that Transformer is able to perform this task, and we prove that this task cannot be performed in truly subquadratic time by any algorithm.
arXiv Detail & Related papers (2024-10-05T19:21:13Z)
Pre-Clustering Point Clouds of Crop Fields Using Scalable Methods [14.06711982797654]
We show a similarity between the current state-of-the-art for this problem and a commonly used density-based clustering algorithm, Quickshift. We propose a number of novel, application specific algorithms with the goal of producing a general and scalable plant segmentation algorithm. When incorporated into field-scale phenotyping systems, the proposed algorithms should work as a drop in replacement that can greatly improve the accuracy of results.
arXiv Detail & Related papers (2021-07-22T22:47:22Z)
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE) Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z)
Higher-order Derivatives of Weighted Finite-state Machines [68.43084108204741]
This work examines the computation of higher-order derivatives with respect to the normalization constant for weighted finite-state machines. We provide a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature. Our algorithm is significantly faster than prior algorithms.
arXiv Detail & Related papers (2021-06-01T19:51:55Z)
Linear Bandit Algorithms with Sublinear Time Complexity [67.21046514005029]
We propose to accelerate existing linear bandit algorithms to achieve per-step time complexity sublinear in the number of arms $K$. We show that our proposed algorithms can achieve $O(K1-alpha(T))$ per-step complexity for some $alpha(T) > 0$ and $widetilde O(stT)$ regret, where $T$ is the time horizon.
arXiv Detail & Related papers (2021-03-03T22:42:15Z)
Quantum-Inspired Classical Algorithm for Principal Component Regression [1.9105479266011323]
We develop an algorithm for principal component regression that runs in time polylogarithmic to the number of data points. This exponential speed up allows for potential applications in much larger data sets.
arXiv Detail & Related papers (2020-10-16T20:50:48Z)
Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning. Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms. We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z)
Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc. Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource. We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)
Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. We propose a novel approach based on Lagrangian Decomposition. We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.