Related papers: $XX^{t}$ Can Be Faster

$XX^{t}$ Can Be Faster

URL: http://arxiv.org/abs/2505.09814v2
Date: Fri, 16 May 2025 09:23:27 GMT
Title: $XX^{t}$ Can Be Faster
Authors: Dmitry Rybin, Yushun Zhang, Zhi-Quan Luo,
Abstract summary: We present RXTX, a new algorithm for computing the product of matrix by its transpose $XXt$ for $Xin mathbbRntimes m$.<n> RXTX uses $5%$ fewer multiplications and $5%$ fewer operations (additions and multiplications) than State-of-the-Art algorithms.
Score: 18.4199325543047
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We present RXTX, a new algorithm for computing the product of matrix by its transpose $XX^{t}$ for $X\in \mathbb{R}^{n\times m}$. RXTX uses $5\%$ fewer multiplications and $5\%$ fewer operations (additions and multiplications) than State-of-the-Art algorithms. Note that the accelerations not only holds asymptotically for large matrices with $n \rightarrow \infty$, but also for small matrices including $n = 4$. The algorithm was discovered by combining Machine Learning-based search methods with Combinatorial Optimization.

Related papers

Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach [36.2561379432247]
Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor.<n>We design a neural architecture, textscStrassenNet, which reproduces the Strassen algorithm for $2times 2$ multiplication.<n>We then train the same architecture on $3times 3$ multiplication with rank $rin19,dots,23$.
arXiv Detail & Related papers (2026-02-25T11:22:31Z)
Fast convolution algorithm for state space models [0.0]
We present an unconditionally stable algorithm for applying matrix transfer function of a linear time invariant system (LTI) in time domain.<n>Applying such transfer function to compute $L$ states requires no more than $2L$ matrix-vector multiplications.
arXiv Detail & Related papers (2024-11-22T05:30:03Z)
Efficient Matrix Factorization Via Householder Reflections [2.3326951882644553]
We show that the exact recovery of the factors $mathbfH$ and $mathbfX$ from $mathbfY$ is guaranteed with $Omega$ columns in $mathbfY$. We hope the techniques in this work help in developing alternate algorithms for dictionary learning.
arXiv Detail & Related papers (2024-05-13T11:13:49Z)
Matrix Compression via Randomized Low Rank and Low Precision Factorization [47.902465710511485]
Modern matrices can involve billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage. We propose an algorithm that exploits this structure to obtain a low rank decomposition of any matrix $mathbfA$ as $mathbfLmathbfR$. We empirically demonstrate the efficacy of our algorithm in image compression, nearest neighbor classification of image and text embeddings, and compressing the layers of LlaMa-$7$b.
arXiv Detail & Related papers (2023-10-17T06:56:57Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem. The goal is to approximate $mathbfA$ as a product of low-rank factors. Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z)
Fast Matrix Multiplication Without Tears: A Constraint Programming Approach [8.52818380743467]
It is known that the multiplication of an $N times M$ matrix with an $M times P$ matrix can be performed using fewer multiplications than what the naive $NMP approach suggests. This gives rise to the constraint satisfaction problem of fast matrix multiplication. We propose a simple yet novel Constraint Programming approach to find non-commutative algorithms for fast matrix multiplication.
arXiv Detail & Related papers (2023-06-01T19:15:24Z)
Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z)
Multiplying Matrices Without Multiplying [0.0]
Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. We introduce a learning-based algorithm for this task that greatly outperforms existing methods.
arXiv Detail & Related papers (2021-06-21T05:08:54Z)
Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$. We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z)
Hutch++: Optimal Stochastic Trace Estimation [75.45968495410048]
We introduce a new randomized algorithm, Hutch++, which computes a $(1 pm epsilon)$ approximation to $tr(A)$ for any positive semidefinite (PSD) $A$. We show that it significantly outperforms Hutchinson's method in experiments.
arXiv Detail & Related papers (2020-10-19T16:45:37Z)
Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples. We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)
What if Neural Networks had SVDs? [66.91160214071088]
Various Neural Networks employ time-consuming matrix operations like matrix inversion. We present an algorithm that is fast enough to speed up several matrix operations.
arXiv Detail & Related papers (2020-09-29T12:58:52Z)
Approximate Multiplication of Sparse Matrices with Limited Space [24.517908972536432]
We develop sparse co-occuring directions, which reduces the time complexity to $widetildeOleft((nnz(X)+nnz(Y))ell+nell2right)$ in expectation. Theoretical analysis reveals that the approximation error of our algorithm is almost the same as that of COD.
arXiv Detail & Related papers (2020-09-08T05:39:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.