Fine-grained Analysis and Faster Algorithms for Iteratively Solving Linear Systems
- URL: http://arxiv.org/abs/2405.05818v2
- Date: Tue, 17 Jun 2025 04:24:53 GMT
- Title: Fine-grained Analysis and Faster Algorithms for Iteratively Solving Linear Systems
- Authors: Michał Dereziński, Daniel LeJeune, Deanna Needell, Elizaveta Rebrova,
- Abstract summary: Despite being a key bottleneck in many machine learning tasks, the cost of solving large linear systems has proven challenging to quantify.<n>We consider a fine-grained notion of complexity for solving linear systems, which is motivated by applications where the data exhibits low-dimensional structure.<n>We give a algorithm based on the Sketch-and-Project paradigm, that solves the linear system $Ax = b$, that is, finds $barx$ such that $|Abarx - b| le epsilon |b|$, in time
- Score: 9.30306458153248
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite being a key bottleneck in many machine learning tasks, the cost of solving large linear systems has proven challenging to quantify due to problem-dependent quantities such as condition numbers. To tackle this, we consider a fine-grained notion of complexity for solving linear systems, which is motivated by applications where the data exhibits low-dimensional structure, including spiked covariance models and kernel machines, and when the linear system is explicitly regularized, such as ridge regression. Concretely, let $\kappa_\ell$ be the ratio between the $\ell$th largest and the smallest singular value of $n\times n$ matrix $A$. We give a stochastic algorithm based on the Sketch-and-Project paradigm, that solves the linear system $Ax = b$, that is, finds $\bar{x}$ such that $\|A\bar{x} - b\| \le \epsilon \|b\|$, in time $\bar O(\kappa_\ell\cdot n^2\log 1/\epsilon)$, for any $\ell = O(n^{0.729})$. This is a direct improvement over preconditioned conjugate gradient, and it provides a stronger separation between stochastic linear solvers and algorithms accessing $A$ only through matrix-vector products. Our main technical contribution is the new analysis of the first and second moments of the random projection matrix that arises in Sketch-and-Project.
Related papers
- Approaching Optimality for Solving Dense Linear Systems with Low-Rank Structure [16.324043075920564]
We provide new high-accuracy randomized algorithms for solving linear systems and regression problems.<n>Our algorithms nearly-match a natural complexity limit under dense inputs for these problems.<n>We show how to obtain these running times even under the weaker assumption that all but $k$ of the singular values have a bounded generalized mean.
arXiv Detail & Related papers (2025-07-15T20:48:30Z) - Quantum linear system algorithm with optimal queries to initial state preparation [0.0]
Quantum algorithms for linear systems produce the solution state $A-1|brangle$ by querying two oracles.
We present a quantum linear system algorithm making $mathbfThetaleft (1/sqrtpright)$ queries to $O_b$, which is optimal in the success probability.
In various applications such as solving differential equations, we can further improve the dependence on $p$ using a block preconditioning scheme.
arXiv Detail & Related papers (2024-10-23T18:00:01Z) - Combinatorial optimization of the coefficient of determination [0.0]
We develop an efficient algorithm for selecting the $k$- subset of $n$ points in the plane with the highest coefficient of determination.
We experimentally demonstrate our method's optimality over several million trials up to $n=30$ without error.
arXiv Detail & Related papers (2024-10-12T00:53:25Z) - Faster Sampling from Log-Concave Densities over Polytopes via Efficient Linear Solvers [29.212403229351253]
We present a nearly-optimal implementation of this Markov chain with per-step complexity which is roughly the number of non-zero entries of $A$ while the number of Markov chain steps remains the same.
Key technical ingredients are 1) to show that the matrices that arise in this Dikin walk change slowly, 2) to deploy efficient linear solvers that can leverage this slow change to speed up matrix inversion by using information computed in previous steps, and 3) to speed up the computation of the determinantal term in the Metropolis filter step via a randomized Taylor series-based estimator.
arXiv Detail & Related papers (2024-09-06T14:49:43Z) - Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning [10.690769339903941]
We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$.<n>Our methods are based on constructing a low-rank Nystr"om approximation to $A$ using sparse random sketching matrix.<n>We prove that the convergence of our methods depends on a natural average condition number of $A$, which improves as the rank of the Nystr"om approximation increases.
arXiv Detail & Related papers (2024-05-09T15:53:43Z) - Solving Dense Linear Systems Faster Than via Preconditioning [1.8854491183340518]
We show that our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
In particular, our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
Our main algorithm can be viewed as a randomized block coordinate descent method.
arXiv Detail & Related papers (2023-12-14T12:53:34Z) - Fast Minimization of Expected Logarithmic Loss via Stochastic Dual
Averaging [8.990961435218544]
We propose a first-order algorithm named $B$-sample dual averaging with the logarithmic barrier.
For the Poisson inverse problem, our algorithm attains an $varepsilon$ solution in $smashtildeO(d3/varepsilon2)$ time.
When computing the maximum-likelihood estimate for quantum state tomography, our algorithm yields an $varepsilon$-optimal solution in $smashtildeO(d3/varepsilon2)$ time.
arXiv Detail & Related papers (2023-11-05T03:33:44Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Fast and Practical Quantum-Inspired Classical Algorithms for Solving
Linear Systems [11.929584800629673]
We propose fast and practical quantum-inspired classical algorithms for solving linear systems.
Our main contribution is the application of the heavy ball momentum method to quantum-inspired classical algorithms for solving linear systems.
arXiv Detail & Related papers (2023-07-13T08:46:19Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Linear Bandit Algorithms with Sublinear Time Complexity [67.21046514005029]
We propose to accelerate existing linear bandit algorithms to achieve per-step time complexity sublinear in the number of arms $K$.
We show that our proposed algorithms can achieve $O(K1-alpha(T))$ per-step complexity for some $alpha(T) > 0$ and $widetilde O(stT)$ regret, where $T$ is the time horizon.
arXiv Detail & Related papers (2021-03-03T22:42:15Z) - Quantum algorithms for spectral sums [50.045011844765185]
We propose new quantum algorithms for estimating spectral sums of positive semi-definite (PSD) matrices.
We show how the algorithms and techniques used in this work can be applied to three problems in spectral graph theory.
arXiv Detail & Related papers (2020-11-12T16:29:45Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Learning nonlinear dynamical systems from a single trajectory [102.60042167341956]
We introduce algorithms for learning nonlinear dynamical systems of the form $x_t+1=sigma(Thetastarx_t)+varepsilon_t$.
We give an algorithm that recovers the weight matrix $Thetastar$ from a single trajectory with optimal sample complexity and linear running time.
arXiv Detail & Related papers (2020-04-30T10:42:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.