Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning
- URL: http://arxiv.org/abs/2405.05865v1
- Date: Thu, 9 May 2024 15:53:43 GMT
- Title: Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning
- Authors: Michał Dereziński, Christopher Musco, Jiaming Yang,
- Abstract summary: We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$.
Our methods are based on constructing a low-rank Nystr"om approximation to $A$ using sparse random sketching.
We prove that the convergence of our methods depends on a natural average condition number of $A$, which improves as the rank of the Nystr"om approximation.
- Score: 10.690769339903941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$. Our methods are based on constructing a low-rank Nystr\"om approximation to $A$ using sparse random sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergence of our methods depends on a natural average condition number of $A$, which improves as the rank of the Nystr\"om approximation increases. Concretely, this allows us to obtain faster runtimes for a number of fundamental linear algebraic problems: 1. We show how to solve any $n\times n$ linear system that is well-conditioned except for $k$ outlying large singular values in $\tilde{O}(n^{2.065} + k^\omega)$ time, improving on a recent result of [Derezi\'nski, Yang, STOC 2024] for all $k \gtrsim n^{0.78}$. 2. We give the first $\tilde{O}(n^2 + {d_\lambda}^{\omega}$) time algorithm for solving a regularized linear system $(A + \lambda I)x = b$, where $A$ is positive semidefinite with effective dimension $d_\lambda$. This problem arises in applications like Gaussian process regression. 3. We give faster algorithms for approximating Schatten $p$-norms and other matrix norms. For example, for the Schatten 1 (nuclear) norm, we give an algorithm that runs in $\tilde{O}(n^{2.11})$ time, improving on an $\tilde{O}(n^{2.18})$ method of [Musco et al., ITCS 2018]. Interestingly, previous state-of-the-art algorithms for most of the problems above relied on stochastic iterative methods, like stochastic coordinate and gradient descent. Our work takes a completely different approach, instead leveraging tools from matrix sketching.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Solving Dense Linear Systems Faster Than via Preconditioning [1.8854491183340518]
We show that our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
In particular, our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
Our main algorithm can be viewed as a randomized block coordinate descent method.
arXiv Detail & Related papers (2023-12-14T12:53:34Z) - Quantum speedups for linear programming via interior point methods [1.8434042562191815]
We describe a quantum algorithm for solving a linear program with $n$ inequality constraints on $d$ variables.
Our algorithm speeds up the Newton step in the state-of-the-art interior point method of Lee and Sidford.
arXiv Detail & Related papers (2023-11-06T16:00:07Z) - Structured Semidefinite Programming for Recovering Structured
Preconditioners [41.28701750733703]
We give an algorithm which, given positive definite $mathbfK in mathbbRd times d$ with $mathrmnnz(mathbfK)$ nonzero entries, computes an $epsilon$-optimal diagonal preconditioner in time.
We attain our results via new algorithms for a class of semidefinite programs we call matrix-dictionary approximation SDPs.
arXiv Detail & Related papers (2023-10-27T16:54:29Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Perseus: A Simple and Optimal High-Order Method for Variational
Inequalities [81.32967242727152]
A VI involves finding $xstar in mathcalX$ such that $langle F(x), x - xstarrangle geq 0$ for all $x in mathcalX$.
We propose a $pth$-order method that does textitnot require any line search procedure and provably converges to a weak solution at a rate of $O(epsilon-2/(p+1))$.
arXiv Detail & Related papers (2022-05-06T13:29:14Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - Learning Stochastic Shortest Path with Linear Function Approximation [74.08819218747341]
We study the shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models.
We propose a novel algorithm for learning the linear mixture SSP, which can attain a $tilde O(d B_star1.5sqrtK/c_min)$ regret.
arXiv Detail & Related papers (2021-10-25T08:34:00Z) - Fast and Near-Optimal Diagonal Preconditioning [46.240079312553796]
We show how to best improve $mathbfA$'s condition number by left or right diagonal rescaling.
We give a solver for structured mixed packing and covering semidefinite programs which computes a constant-factor optimal scaling for $mathbfA$ in $widetildeO(textnnz(mathbfA) cdot textpoly(kappastar))$ time.
arXiv Detail & Related papers (2020-08-04T17:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.