Robust SVD Made Easy: A fast and reliable algorithm for large-scale data
analysis
- URL: http://arxiv.org/abs/2402.09754v1
- Date: Thu, 15 Feb 2024 07:08:11 GMT
- Title: Robust SVD Made Easy: A fast and reliable algorithm for large-scale data
analysis
- Authors: Sangil Han, Kyoowon Kim, Sungkyu Jung
- Abstract summary: Existing robust SVD algorithms often sacrifice speed for robustness or fail in the presence of only a few outliers.
This study introduces an efficient algorithm, called Spherically Normalized SVD, for robust SVD approximation.
The proposed algorithm achieves remarkable speed by utilizing only two applications of a standard reduced-rank SVD algorithm.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The singular value decomposition (SVD) is a crucial tool in machine learning
and statistical data analysis. However, it is highly susceptible to outliers in
the data matrix. Existing robust SVD algorithms often sacrifice speed for
robustness or fail in the presence of only a few outliers. This study
introduces an efficient algorithm, called Spherically Normalized SVD, for
robust SVD approximation that is highly insensitive to outliers,
computationally scalable, and provides accurate approximations of singular
vectors. The proposed algorithm achieves remarkable speed by utilizing only two
applications of a standard reduced-rank SVD algorithm to appropriately scaled
data, significantly outperforming competing algorithms in computation times. To
assess the robustness of the approximated singular vectors and their subspaces
against data contamination, we introduce new notions of breakdown points for
matrix-valued input, including row-wise, column-wise, and block-wise breakdown
points. Theoretical and empirical analyses demonstrate that our algorithm
exhibits higher breakdown points compared to standard SVD and its
modifications. We empirically validate the effectiveness of our approach in
applications such as robust low-rank approximation and robust principal
component analysis of high-dimensional microarray datasets. Overall, our study
presents a highly efficient and robust solution for SVD approximation that
overcomes the limitations of existing algorithms in the presence of outliers.
Related papers
- Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement
Learning [53.445068584013896]
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure.
In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP.
We show that simple spectral-based matrix estimation approaches efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
arXiv Detail & Related papers (2023-10-10T17:06:41Z) - Randomized Dimension Reduction with Statistical Guarantees [0.27195102129095]
This thesis explores some of such algorithms for fast execution and efficient data utilization.
We focus on learning algorithms with various incorporations of data augmentation that improve generalization and distributional provably.
Specifically, Chapter 4 presents a sample complexity analysis for data augmentation consistency regularization.
arXiv Detail & Related papers (2023-10-03T02:01:39Z) - Fast and Provable Tensor Robust Principal Component Analysis via Scaled
Gradient Descent [30.299284742925852]
This paper tackles tensor robust principal component analysis (RPCA)
It aims to recover a low-rank tensor from its observations contaminated by sparse corruptions.
We show that the proposed algorithm achieves better and more scalable performance than state-of-the-art matrix and tensor RPCA algorithms.
arXiv Detail & Related papers (2022-06-18T04:01:32Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient
Descent [8.714458129632158]
Kolmogorov model (KM) is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables.
We propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method.
It is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80%$.
arXiv Detail & Related papers (2021-07-11T10:33:02Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Accurate and fast matrix factorization for low-rank learning [4.435094091999926]
We tackle two important challenges related to the accurate partial singular value decomposition (SVD) and numerical rank estimation of a huge matrix.
We use the concepts of Krylov subspaces such as the Golub-Kahan bidiagonalization process as well as Ritz vectors to achieve these goals.
arXiv Detail & Related papers (2021-04-21T22:35:02Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Approximate Cross-Validation with Low-Rank Data in High Dimensions [35.74302895575951]
Cross-validation is an important tool for model assessment.
ACV methods can lose both speed and accuracy in high dimensions unless sparsity structure is present in the data.
We develop a new algorithm for ACV that is fast and accurate in the presence of ALR data.
arXiv Detail & Related papers (2020-08-24T16:34:05Z) - Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics.
Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis.
We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.