Related papers: On the Noise Sensitivity of the Randomized SVD

On the Noise Sensitivity of the Randomized SVD

URL: http://arxiv.org/abs/2305.17435v2
Date: Thu, 6 Jul 2023 17:30:41 GMT
Title: On the Noise Sensitivity of the Randomized SVD
Authors: Elad Romanov
Abstract summary: The randomized singular value decomposition (R-SVD) is a popular sketching-based algorithm for efficiently computing the partial SVD of a large matrix. We analyze the R-SVD under a low-rank signal plus noise measurement model. The singular values produced by the R-SVD are shown to exhibit a BBP-like phase transition.
Score: 8.98526174345299
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The randomized singular value decomposition (R-SVD) is a popular sketching-based algorithm for efficiently computing the partial SVD of a large matrix. When the matrix is low-rank, the R-SVD produces its partial SVD exactly; but when the rank is large, it only yields an approximation. Motivated by applications in data science and principal component analysis (PCA), we analyze the R-SVD under a low-rank signal plus noise measurement model; specifically, when its input is a spiked random matrix. The singular values produced by the R-SVD are shown to exhibit a BBP-like phase transition: when the SNR exceeds a certain detectability threshold, that depends on the dimension reduction factor, the largest singular value is an outlier; below the threshold, no outlier emerges from the bulk of singular values. We further compute asymptotic formulas for the overlap between the ground truth signal singular vectors and the approximations produced by the R-SVD. Dimensionality reduction has the adverse affect of amplifying the noise in a highly nonlinear manner. Our results demonstrate the statistical advantage -- in both signal detection and estimation -- of the R-SVD over more naive sketched PCA variants; the advantage is especially dramatic when the sketching dimension is small. Our analysis is asymptotically exact, and substantially more fine-grained than existing operator-norm error bounds for the R-SVD, which largely fail to give meaningful error estimates in the moderate SNR regime. It applies for a broad family of sketching matrices previously considered in the literature, including Gaussian i.i.d. sketches, random projections, and the sub-sampled Hadamard transform, among others. Lastly, we derive an optimal singular value shrinker for singular values and vectors obtained through the R-SVD, which may be useful for applications in matrix denoising.

Related papers

Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent [4.123899820318987]
This paper develops a scaled gradient descent (ScaledGD) algorithm to directly estimate the tensor factors. In theory, ScaledGD achieves linear convergence at a constant rate that is independent of the condition number of the ground truth low-rank tensor. numerical examples are provided to demonstrate the efficacy of ScaledGD in accelerating the convergence rate of ill-conditioned low-rank tensor estimation.
arXiv Detail & Related papers (2025-01-03T08:26:01Z)
Robust SVD Made Easy: A fast and reliable algorithm for large-scale data analysis [0.0]
Existing robust SVD algorithms often sacrifice speed for robustness or fail in the presence of only a few outliers. This study introduces an efficient algorithm, called Spherically Normalized SVD, for robust SVD approximation. The proposed algorithm achieves remarkable speed by utilizing only two applications of a standard reduced-rank SVD algorithm.
arXiv Detail & Related papers (2024-02-15T07:08:11Z)
A Random Matrix Approach to Low-Multilinear-Rank Tensor Approximation [24.558241146742205]
We characterize the large-dimensional spectral behavior of the unfoldings of the data tensor and exhibit relevant signal-to-noise ratios governing the detectability of the principal directions of the signal. Results allow to accurately predict the reconstruction performance of truncated multilinear SVD (MLSVD) in the non-trivial regime.
arXiv Detail & Related papers (2024-02-05T16:38:30Z)
Risk Bounds of Accelerated SGD for Overparameterized Linear Regression [75.27846230182885]
Accelerated gradient descent (ASGD) is a workhorse in deep learning. Existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization.
arXiv Detail & Related papers (2023-11-23T23:02:10Z)
Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled Gradient Descent, Even with Overparameterization [48.65416821017865]
This chapter introduces a new algorithmic approach, dubbed scaled gradient (ScaledGD) It converges linearly at a constant rate independent of the condition number of the low-rank object. It maintains the low periteration cost of gradient descent for a variety of tasks.
arXiv Detail & Related papers (2023-10-09T21:16:57Z)
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z)
On the Size and Approximation Error of Distilled Sets [57.61696480305911]
We take a theoretical view on kernel ridge regression based methods of dataset distillation such as Kernel Inducing Points. We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data. A KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data.
arXiv Detail & Related papers (2023-05-23T14:37:43Z)
Noise-cleaning the precision matrix of fMRI time series [2.6399022396257794]
We consider several standard noise-cleaning algorithms and compare them on two types of datasets. The reliability of each algorithm is assessed in terms of test-set likelihood and, in the case of synthetic data, of the distance from the true precision matrix. We propose a variant of the Optimal Rotationally Invariant Estimator in which one of its parameters is optimised by cross-validation.
arXiv Detail & Related papers (2023-02-06T17:32:17Z)
Robust Singular Values based on L1-norm PCA [11.706222361809374]
Singular-Value Decomposition (SVD) is a ubiquitous data analysis method in engineering, science, and statistics. We present a novel robust non-parametric method for SVD and singular-value estimation based on a L1-norm (sum of absolute values) formulation.
arXiv Detail & Related papers (2022-10-21T16:42:49Z)
Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent [7.00422423634143]
We prove that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression. We derive a model estimation accuracy result in the setting when the true model is sparse.
arXiv Detail & Related papers (2022-04-29T19:37:24Z)
Robust Differentiable SVD [117.35644933471401]
Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. Instability arises in the presence of eigenvalues that are close to each other. We show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying on an iterative process.
arXiv Detail & Related papers (2021-04-08T15:04:15Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.