Sketching Algorithms and Lower Bounds for Ridge Regression
- URL: http://arxiv.org/abs/2204.06653v1
- Date: Wed, 13 Apr 2022 22:18:47 GMT
- Title: Sketching Algorithms and Lower Bounds for Ridge Regression
- Authors: Praneeth Kacham and David P. Woodruff
- Abstract summary: We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
- Score: 65.0720777731368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give a sketching-based iterative algorithm that computes $1+\varepsilon$
approximate solutions for the ridge regression problem $\min_x \|{Ax-b}\|_2^2
+\lambda\|{x}\|_2^2$ where $A \in \mathbb{R}^{n \times d}$ with $d \ge n$. Our
algorithm, for a constant number of iterations (requiring a constant number of
passes over the input), improves upon earlier work of Chowdhury et al., by
requiring that the sketching matrix only has a weaker Approximate Matrix
Multiplication (AMM) guarantee that depends on $\epsilon$, along with a
constant subspace embedding guarantee. The earlier work instead requires that
the sketching matrix have a subspace embedding guarantee that depends on
$\epsilon$. For example, to produce a $1+\varepsilon$ approximate solution in
$1$ iteration, which requires $2$ passes over the input, our algorithm requires
the OSNAP embedding to have $m= O(n\sigma^2/\lambda\varepsilon)$ rows with a
sparsity parameter $s = O(\log(n))$, whereas the earlier algorithm of Chowdhury
et al., with the same number of rows of OSNAP requires a sparsity $s =
O(\sqrt{\sigma^2/\lambda\varepsilon} \cdot \log(n))$, where $\sigma =
\|{A}\|_2$ is the spectral norm of the matrix $A$. We also show that this
algorithm can be used to give faster algorithms for kernel ridge regression.
Finally, we show that the sketch size required for our algorithm is essentially
optimal for a natural framework of algorithms for ridge regression by proving
lower bounds on oblivious sketching matrices for AMM. The sketch size lower
bounds for AMM may be of independent interest.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning [10.690769339903941]
We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$.
Our methods are based on constructing a low-rank Nystr"om approximation to $A$ using sparse random sketching.
We prove that the convergence of our methods depends on a natural average condition number of $A$, which improves as the rank of the Nystr"om approximation.
arXiv Detail & Related papers (2024-05-09T15:53:43Z) - Optimal Embedding Dimension for Sparse Subspace Embeddings [4.042707434058959]
A random $mtimes n$ matrix $S$ is an oblivious subspace embedding (OSE)
We show that an $mtimes n$ random matrix $S$ with $mgeq (1+theta)d$ is an oblivious subspace embedding with $epsilon = O_theta(1)$.
We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time.
arXiv Detail & Related papers (2023-11-17T18:01:58Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee [16.409210914237086]
Given a matrix $Ain mathbbRntimes d$ and a tensor $bin mathbbRn$, we consider the regression problem with $ell_infty$ guarantees.
We show that in order to obtain such $ell_infty$ guarantee for $ell$ regression, one has to use sketching matrices that are dense.
We also develop a novel analytical framework for $ell_infty$ guarantee regression that utilizes the Oblivious Coordinate-wise Embedding (OCE) property
arXiv Detail & Related papers (2023-02-01T05:22:40Z) - Private Isotonic Regression [54.32252900997422]
We consider the problem of isotonic regression over a partially ordered set (poset) $mathcalX$ and for any Lipschitz loss function.
We obtain a pure-DP algorithm that has an expected excess empirical risk of roughly $mathrmwidth(mathcalX) cdot log|mathcalX| / n$, where $mathrmwidth(mathcalX)$ is the width of the poset.
We show that the bounds above are essentially the best that can be
arXiv Detail & Related papers (2022-10-27T05:08:07Z) - Optimal Query Complexities for Dynamic Trace Estimation [59.032228008383484]
We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly.
We provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $epsilon$ error with $delta$ failure probability.
Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation.
arXiv Detail & Related papers (2022-09-30T04:15:44Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.