Related papers: Support Basis: Fast Attention Beyond Bounded Entries

Support Basis: Fast Attention Beyond Bounded Entries

URL: http://arxiv.org/abs/2510.01643v1
Date: Thu, 02 Oct 2025 03:51:28 GMT
Title: Support Basis: Fast Attention Beyond Bounded Entries
Authors: Maryam Aliakbarpour, Vladimir Braverman, Junze Yin, Haochen Zhang,
Abstract summary: We introduce support-basis decomposition, a new framework for efficient attention approximation beyond bounded entries.<n>Our approach uses this property to split large and small entries, enabling exact computation on sparse components and approximation on dense components.
Score: 21.21399891887812
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The quadratic complexity of softmax attention remains a central bottleneck in scaling large language models (LLMs). [Alman and Song, NeurIPS 2023] proposed a sub-quadratic attention approximation algorithm, but it works only under the restrictive bounded-entry assumption. Since this assumption rarely holds in practice, its applicability to modern LLMs is limited. In this paper, we introduce support-basis decomposition, a new framework for efficient attention approximation beyond bounded entries. We empirically demonstrate that the entries of the query and key matrices exhibit sub-Gaussian behavior. Our approach uses this property to split large and small entries, enabling exact computation on sparse components and polynomial approximation on dense components. We establish rigorous theoretical guarantees, proving a sub-quadratic runtime, and extend the method to a multi-threshold setting that eliminates all distributional assumptions. Furthermore, we provide the first theoretical justification for the empirical success of polynomial attention [Kacham, Mirrokni, and Zhong, ICML 2024], showing that softmax attention can be closely approximated by a combination of multiple polynomial attentions with sketching.

Related papers

The Theory and Practice of MAP Inference over Non-Convex Constraints [11.058494098615576]
In safety-critical settings, probabilistic ML systems have to make predictions subject to algebraic constraints.<n>This makes computing this constrained maximum a posteriori (MAP) prediction efficiently and reliably extremely challenging.<n>We devise a scalable message-passing algorithm for this tractable fragment.<n>Then, we devise a general constrained MAP strategy that interleaves partitioning the domain into convex feasible regions.
arXiv Detail & Related papers (2026-02-09T14:05:58Z)
Adaptive Sparsification for Linear Programming [3.735586259382096]
We introduce a generic framework for solving linear programs with many constraints $(n gg d)$ via adaptive sparsification.<n>We present a quantum version of Clarkson's algorithm that finds an exact solution to an LP using $tildeO(sqrtn d3)$ row-queries to the constraint matrix.<n>Second, our framework yields new state-of-the-art algorithms for mixed packing and covering problems when the packing constraints are simple''
arXiv Detail & Related papers (2025-10-09T15:36:00Z)
Mechanisms for Quantum Advantage in Global Optimization of Nonconvex Functions [6.135587835061064]
We show new theoretical mechanisms for quantum speedup in the global optimization of non-asymotic functions.<n>We formalize these ideas by proving that a real-space quantum algorithm (RsAA) achieves provably on-time runtimes.
arXiv Detail & Related papers (2025-10-03T17:40:31Z)
Maximum a Posteriori Inference for Factor Graphs via Benders' Decomposition [0.38233569758620056]
We present a method for maximum a-posteriori inference in general Bayesian factor models. We derive MAP estimation algorithms for the Bayesian Gaussian mixture model and latent Dirichlet allocation.
arXiv Detail & Related papers (2024-10-24T19:57:56Z)
Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization. We provide novel excess risk bounds, both in expectation and with high probability. Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z)
Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming [1.3053649021965603]
The minimum sum-of-squares clustering (MSSC) has been recently extended to exploit prior knowledge on the cardinality of each cluster. We propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node.
arXiv Detail & Related papers (2022-09-19T10:19:06Z)
Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces. We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z)
Constrained mixers for the quantum approximate optimization algorithm [55.41644538483948]
We present a framework for constructing mixing operators that restrict the evolution to a subspace of the full Hilbert space. We generalize the "XY"-mixer designed to preserve the subspace of "one-hot" states to the general case of subspaces given by a number of computational basis states. Our analysis also leads to valid Trotterizations for "XY"-mixer with fewer CX gates than is known to date.
arXiv Detail & Related papers (2022-03-11T17:19:26Z)
Near-Optimal Algorithms for Linear Algebra in the Current Matrix Multiplication Time [46.31710224483631]
We show how to bypass the main open question of Nelson and Nguyen (FOCS, 2013) regarding the logarithmic factors in the sketching dimension for existing constant factor approximation oblivious subspace embeddings. A key technique we use is an explicit mapping of Indyk based on uncertainty principles and extractors. For the fundamental problems of rank computation and finding a linearly independent subset of columns, our algorithms improve Cheung, Kwok, and Lau (JACM, 2013) and are optimal to within a constant factor and a $loglog(n)$-factor, respectively.
arXiv Detail & Related papers (2021-07-16T19:34:10Z)
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality. We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z)
Semi-Sparsity for Smoothing Filters [1.1404527665142667]
We show a new semi-sparsity smoothing algorithm based on a novel sparsity-inducing framework. We show many benefits to a series of signal/image processing and computer vision applications.
arXiv Detail & Related papers (2021-07-01T17:31:42Z)
Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics. Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis. We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)
The limits of min-max optimization algorithms: convergence to spurious non-critical sets [82.74514886461257]
min-max optimization algorithms encounter problems far greater because of the existence of periodic cycles and similar phenomena. We show that there exist algorithms that do not attract any points of the problem. We illustrate such challenges in simple to almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost almost
arXiv Detail & Related papers (2020-06-16T10:49:27Z)
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.