Related papers: Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach

Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach

URL: http://arxiv.org/abs/2602.21797v1
Date: Wed, 25 Feb 2026 11:22:31 GMT
Title: Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach
Authors: Paolo Andreini, Alessandra Bernardi, Monica Bianchini, Barbara Toniella Corradini, Sara Marziali, Giacomo Nunziati, Franco Scarselli,
Abstract summary: Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor.<n>We design a neural architecture, textscStrassenNet, which reproduces the Strassen algorithm for $2times 2$ multiplication.<n>We then train the same architecture on $3times 3$ multiplication with rank $rin19,dots,23$.
Score: 36.2561379432247
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor. We design a neural architecture, \textsc{StrassenNet}, which reproduces the Strassen algorithm for $2\times 2$ multiplication. Across many independent runs the network always converges to a rank-$7$ tensor, thus numerically recovering Strassen's optimal algorithm. We then train the same architecture on $3\times 3$ multiplication with rank $r\in\{19,\dots,23\}$. Our experiments reveal a clear numerical threshold: models with $r=23$ attain significantly lower validation error than those with $r\le 22$, suggesting that $r=23$ could actually be the smallest effective rank of the matrix multiplication tensor $3\times 3$. We also sketch an extension of the method to border-rank decompositions via an $\varepsilon$--parametrisation and report preliminary results consistent with the known bounds for the border rank of the $3\times 3$ matrix--multiplication tensor.

Related papers

$XX^{t}$ Can Be Faster [18.4199325543047]
We present RXTX, a new algorithm for computing the product of matrix by its transpose $XXt$ for $Xin mathbbRntimes m$.<n> RXTX uses $5%$ fewer multiplications and $5%$ fewer operations (additions and multiplications) than State-of-the-Art algorithms.
arXiv Detail & Related papers (2025-05-14T21:31:44Z)
Overcomplete Tensor Decomposition via Koszul-Young Flattenings [56.82556231289414]
We give a new algorithm for decomposing an $n_times n times n_3$ tensor as the sum of a minimal number of rank-1 terms.<n>We show that an even more general class of degree-$d$s cannot surpass rank $Cn$ for a constant $C = C(d)$.
arXiv Detail & Related papers (2024-11-21T17:41:09Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Fast Matrix Multiplication Without Tears: A Constraint Programming Approach [8.52818380743467]
It is known that the multiplication of an $N times M$ matrix with an $M times P$ matrix can be performed using fewer multiplications than what the naive $NMP approach suggests. This gives rise to the constraint satisfaction problem of fast matrix multiplication. We propose a simple yet novel Constraint Programming approach to find non-commutative algorithms for fast matrix multiplication.
arXiv Detail & Related papers (2023-06-01T19:15:24Z)
Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor Decompositions [51.19236668224547]
We study low rank approximation of tensors, focusing on the tensor train and Tucker decompositions. For tensor train decomposition, we give a bicriteria $(1 + eps)$-approximation algorithm with a small bicriteria rank and $O(q cdot nnz(A))$ running time. In addition, we extend our algorithm to tensor networks with arbitrary graphs.
arXiv Detail & Related papers (2022-07-15T11:55:09Z)
Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time [12.348083977777833]
We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In this work, we show how to reduce the training cost per iteration.
arXiv Detail & Related papers (2021-12-14T18:13:36Z)
Reconstruction Algorithms for Low-Rank Tensors and Depth-3 Multilinear Circuits [4.129484350382891]
We give new and efficient black-box reconstruction algorithms for some classes of depth$3$ arithmetic circuits. Our algorithm works over all fields characteristic 0 or large enough characteristic.
arXiv Detail & Related papers (2021-05-04T20:45:07Z)
Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z)
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network. We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)
Constant-Depth and Subcubic-Size Threshold Circuits for Matrix Multiplication [1.9518237361775532]
Recent advances in large-scale neural computing hardware has made their practical implementation a near-term possibility. We describe a theoretical approach for multiplying two $N$ by $N$ matrices that integrates threshold gate logic. Dense matrix multiplication is a core operation in convolutional neural network training.
arXiv Detail & Related papers (2020-06-25T18:28:10Z)
Training (Overparametrized) Neural Networks in Near-Linear Time [21.616949485102342]
We show how to speed up the algorithm of [CGH+1] for training (mildly overetrized) ReparamLU networks. The centerpiece of our algorithm is to reformulate the Gauss-Newton as an $ell$-recondition.
arXiv Detail & Related papers (2020-06-20T20:26:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.