Related papers: Robust SVD Made Easy: A fast and reliable algorithm for large-scale data analysis

Robust SVD Made Easy: A fast and reliable algorithm for large-scale data analysis

URL: http://arxiv.org/abs/2402.09754v1
Date: Thu, 15 Feb 2024 07:08:11 GMT
Title: Robust SVD Made Easy: A fast and reliable algorithm for large-scale data analysis
Authors: Sangil Han, Kyoowon Kim, Sungkyu Jung
Abstract summary: Existing robust SVD algorithms often sacrifice speed for robustness or fail in the presence of only a few outliers. This study introduces an efficient algorithm, called Spherically Normalized SVD, for robust SVD approximation. The proposed algorithm achieves remarkable speed by utilizing only two applications of a standard reduced-rank SVD algorithm.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The singular value decomposition (SVD) is a crucial tool in machine learning and statistical data analysis. However, it is highly susceptible to outliers in the data matrix. Existing robust SVD algorithms often sacrifice speed for robustness or fail in the presence of only a few outliers. This study introduces an efficient algorithm, called Spherically Normalized SVD, for robust SVD approximation that is highly insensitive to outliers, computationally scalable, and provides accurate approximations of singular vectors. The proposed algorithm achieves remarkable speed by utilizing only two applications of a standard reduced-rank SVD algorithm to appropriately scaled data, significantly outperforming competing algorithms in computation times. To assess the robustness of the approximated singular vectors and their subspaces against data contamination, we introduce new notions of breakdown points for matrix-valued input, including row-wise, column-wise, and block-wise breakdown points. Theoretical and empirical analyses demonstrate that our algorithm exhibits higher breakdown points compared to standard SVD and its modifications. We empirically validate the effectiveness of our approach in applications such as robust low-rank approximation and robust principal component analysis of high-dimensional microarray datasets. Overall, our study presents a highly efficient and robust solution for SVD approximation that overcomes the limitations of existing algorithms in the presence of outliers.

Related papers

Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration [7.304283080560899]
Modern data analysis increasingly requires identifying shared latent structure across multiple high-dimensional datasets.<n>Two primary methods have emerged for estimating this shared structure, which vary in how they integrate information across datasets.<n>We develop exact expressions for the performance and phase transitions of these two methods and develop optimal weighting schemes to further improve both methods.
arXiv Detail & Related papers (2025-07-29T19:03:01Z)
Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning [53.445068584013896]
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure. In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP. We show that simple spectral-based matrix estimation approaches efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
arXiv Detail & Related papers (2023-10-10T17:06:41Z)
Randomized Dimension Reduction with Statistical Guarantees [0.27195102129095]
This thesis explores some of such algorithms for fast execution and efficient data utilization. We focus on learning algorithms with various incorporations of data augmentation that improve generalization and distributional provably. Specifically, Chapter 4 presents a sample complexity analysis for data augmentation consistency regularization.
arXiv Detail & Related papers (2023-10-03T02:01:39Z)
Fast and Provable Tensor Robust Principal Component Analysis via Scaled Gradient Descent [30.299284742925852]
This paper tackles tensor robust principal component analysis (RPCA) It aims to recover a low-rank tensor from its observations contaminated by sparse corruptions. We show that the proposed algorithm achieves better and more scalable performance than state-of-the-art matrix and tensor RPCA algorithms.
arXiv Detail & Related papers (2022-06-18T04:01:32Z)
Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints. We develop novel and simple optimization formulations for these problems. As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z)
Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient Descent [8.714458129632158]
Kolmogorov model (KM) is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables. We propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method. It is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80%$.
arXiv Detail & Related papers (2021-07-11T10:33:02Z)
Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z)
Accurate and fast matrix factorization for low-rank learning [4.435094091999926]
We tackle two important challenges related to the accurate partial singular value decomposition (SVD) and numerical rank estimation of a huge matrix. We use the concepts of Krylov subspaces such as the Golub-Kahan bidiagonalization process as well as Ritz vectors to achieve these goals.
arXiv Detail & Related papers (2021-04-21T22:35:02Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
Approximate Cross-Validation with Low-Rank Data in High Dimensions [35.74302895575951]
Cross-validation is an important tool for model assessment. ACV methods can lose both speed and accuracy in high dimensions unless sparsity structure is present in the data. We develop a new algorithm for ACV that is fast and accurate in the presence of ALR data.
arXiv Detail & Related papers (2020-08-24T16:34:05Z)
Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics. Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis. We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)
Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.