Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach
- URL: http://arxiv.org/abs/2602.21797v1
- Date: Wed, 25 Feb 2026 11:22:31 GMT
- Title: Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach
- Authors: Paolo Andreini, Alessandra Bernardi, Monica Bianchini, Barbara Toniella Corradini, Sara Marziali, Giacomo Nunziati, Franco Scarselli,
- Abstract summary: Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor.<n>We design a neural architecture, textscStrassenNet, which reproduces the Strassen algorithm for $2times 2$ multiplication.<n>We then train the same architecture on $3times 3$ multiplication with rank $rin19,dots,23$.
- Score: 36.2561379432247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor. We design a neural architecture, \textsc{StrassenNet}, which reproduces the Strassen algorithm for $2\times 2$ multiplication. Across many independent runs the network always converges to a rank-$7$ tensor, thus numerically recovering Strassen's optimal algorithm. We then train the same architecture on $3\times 3$ multiplication with rank $r\in\{19,\dots,23\}$. Our experiments reveal a clear numerical threshold: models with $r=23$ attain significantly lower validation error than those with $r\le 22$, suggesting that $r=23$ could actually be the smallest effective rank of the matrix multiplication tensor $3\times 3$. We also sketch an extension of the method to border-rank decompositions via an $\varepsilon$--parametrisation and report preliminary results consistent with the known bounds for the border rank of the $3\times 3$ matrix--multiplication tensor.
Related papers
- $XX^{t}$ Can Be Faster [18.4199325543047]
We present RXTX, a new algorithm for computing the product of matrix by its transpose $XXt$ for $Xin mathbbRntimes m$.<n> RXTX uses $5%$ fewer multiplications and $5%$ fewer operations (additions and multiplications) than State-of-the-Art algorithms.
arXiv Detail & Related papers (2025-05-14T21:31:44Z) - Overcomplete Tensor Decomposition via Koszul-Young Flattenings [56.82556231289414]
We give a new algorithm for decomposing an $n_times n times n_3$ tensor as the sum of a minimal number of rank-1 terms.<n>We show that an even more general class of degree-$d$s cannot surpass rank $Cn$ for a constant $C = C(d)$.
arXiv Detail & Related papers (2024-11-21T17:41:09Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Fast Matrix Multiplication Without Tears: A Constraint Programming
Approach [8.52818380743467]
It is known that the multiplication of an $N times M$ matrix with an $M times P$ matrix can be performed using fewer multiplications than what the naive $NMP approach suggests.
This gives rise to the constraint satisfaction problem of fast matrix multiplication.
We propose a simple yet novel Constraint Programming approach to find non-commutative algorithms for fast matrix multiplication.
arXiv Detail & Related papers (2023-06-01T19:15:24Z) - Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor
Decompositions [51.19236668224547]
We study low rank approximation of tensors, focusing on the tensor train and Tucker decompositions.
For tensor train decomposition, we give a bicriteria $(1 + eps)$-approximation algorithm with a small bicriteria rank and $O(q cdot nnz(A))$ running time.
In addition, we extend our algorithm to tensor networks with arbitrary graphs.
arXiv Detail & Related papers (2022-07-15T11:55:09Z) - Training Multi-Layer Over-Parametrized Neural Network in Subquadratic
Time [12.348083977777833]
We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function.
In this work, we show how to reduce the training cost per iteration.
arXiv Detail & Related papers (2021-12-14T18:13:36Z) - Reconstruction Algorithms for Low-Rank Tensors and Depth-3 Multilinear
Circuits [4.129484350382891]
We give new and efficient black-box reconstruction algorithms for some classes of depth$3$ arithmetic circuits.
Our algorithm works over all fields characteristic 0 or large enough characteristic.
arXiv Detail & Related papers (2021-05-04T20:45:07Z) - Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - Constant-Depth and Subcubic-Size Threshold Circuits for Matrix
Multiplication [1.9518237361775532]
Recent advances in large-scale neural computing hardware has made their practical implementation a near-term possibility.
We describe a theoretical approach for multiplying two $N$ by $N$ matrices that integrates threshold gate logic.
Dense matrix multiplication is a core operation in convolutional neural network training.
arXiv Detail & Related papers (2020-06-25T18:28:10Z) - Training (Overparametrized) Neural Networks in Near-Linear Time [21.616949485102342]
We show how to speed up the algorithm of [CGH+1] for training (mildly overetrized) ReparamLU networks.
The centerpiece of our algorithm is to reformulate the Gauss-Newton as an $ell$-recondition.
arXiv Detail & Related papers (2020-06-20T20:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.