PRISM: Distribution-free Adaptive Computation of Matrix Functions for Accelerating Neural Network Training
- URL: http://arxiv.org/abs/2601.22137v1
- Date: Thu, 29 Jan 2026 18:55:46 GMT
- Title: PRISM: Distribution-free Adaptive Computation of Matrix Functions for Accelerating Neural Network Training
- Authors: Shenghao Yang, Zhichao Wang, Oleg Balabanov, N. Benjamin Erichson, Michael W. Mahoney,
- Abstract summary: We present PRISM (Polynomial-fitting and Randomized Iterative Sketching for Matrix functions), a framework for accelerating computing algorithms for matrix functions.<n> PRISM combines adaptive approximation with randomized sketching: at each iteration, it fits a surrogate to the current spectrum via a sketched least-squares problem.<n>Unlike prior methods, PRISM requires no explicit spectral bounds or singular value estimates; and it adapts automatically to the evolving spectrum.
- Score: 47.80717552769429
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Matrix functions such as square root, inverse roots, and orthogonalization play a central role in preconditioned gradient methods for neural network training. This has motivated the development of iterative algorithms that avoid explicit eigendecompositions and rely primarily on matrix multiplications, making them well suited for modern GPU accelerators. We present PRISM (Polynomial-fitting and Randomized Iterative Sketching for Matrix functions computation), a general framework for accelerating iterative algorithms for computing matrix functions. PRISM combines adaptive polynomial approximation with randomized sketching: at each iteration, it fits a polynomial surrogate to the current spectrum via a sketched least-squares problem, adapting to the instance at hand with minimal overhead. We apply PRISM to accelerate Newton-Schulz-like iterations for matrix square roots and orthogonalization, which are core primitives in machine learning. Unlike prior methods, PRISM requires no explicit spectral bounds or singular value estimates; and it adapts automatically to the evolving spectrum. Empirically, PRISM accelerates training when integrated into Shampoo and Muon optimizers.
Related papers
- Learning to Discover Iterative Spectral Algorithms [46.39331984989963]
AutoSpec is a neural network framework for discovering iterative spectral algorithms for large-scale numerical linear algebra.<n>We apply AutoSpec to discovering algorithms for representative numerical linear algebra tasks.<n>On real-world solvers, the learned procedures deliver orders-of-magnitude improvements in accuracy and/or reductions in count.
arXiv Detail & Related papers (2026-02-10T08:39:59Z) - MatRL: Provably Generalizable Iterative Algorithm Discovery via Monte-Carlo Tree Search [37.24058519921229]
MatRL is a reinforcement learning framework that automatically discovers iterative algorithms for computing matrix functions.<n>We show that MatRL produces algorithms that outperform various baselines in the literature.
arXiv Detail & Related papers (2025-07-04T22:57:33Z) - Tensor networks based quantum optimization algorithm [0.0]
In optimization, one of the well-known classical algorithms is power iterations.
We propose a quantum realiziation to circumvent this pitfall.
Our methodology becomes instance agnostic and thus allows one to address black-box optimization within the framework of quantum computing.
arXiv Detail & Related papers (2024-04-23T13:49:11Z) - GloptiNets: Scalable Non-Convex Optimization with Certificates [61.50835040805378]
We present a novel approach to non-cube optimization with certificates, which handles smooth functions on the hypercube or on the torus.
By exploiting the regularity of the target function intrinsic in the decay of its spectrum, we allow at the same time to obtain precise certificates and leverage the advanced and powerful neural networks.
arXiv Detail & Related papers (2023-06-26T09:42:59Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - Matrix Reordering for Noisy Disordered Matrices: Optimality and
Computationally Efficient Algorithms [9.245687221460654]
Motivated by applications in single-cell biology and metagenomics, we investigate the problem of matrixing based on a noisy monotone Toeplitz matrix model.
We establish fundamental statistical limit for this problem in a decision-theoretic framework and demonstrate that a constrained least squares rate.
To address this, we propose a novel-time adaptive sorting algorithm with guaranteed performance improvement.
arXiv Detail & Related papers (2022-01-17T14:53:52Z) - The Fast Kernel Transform [21.001203328543006]
We propose the Fast Kernel Transform (FKT), a general algorithm to compute matrix-vector multiplications for datasets in moderate dimensions with quasilinear complexity.
The FKT is easily applied to a broad class of kernels, including Gaussian, Matern, and Rational Quadratic covariance functions and physically motivated Green's functions.
We illustrate the efficacy and versatility of the FKT by providing timing and accuracy benchmarks and by applying it to scale the neighborhood embedding (t-SNE) and Gaussian processes to large real-world data sets.
arXiv Detail & Related papers (2021-06-08T16:15:47Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.