PCA recovery thresholds in low-rank matrix inference with sparse noise
- URL: http://arxiv.org/abs/2511.11927v1
- Date: Fri, 14 Nov 2025 23:09:54 GMT
- Title: PCA recovery thresholds in low-rank matrix inference with sparse noise
- Authors: Urte Adomaityte, Gabriele Sicuro, Pierpaolo Vivo,
- Abstract summary: We study the high-dimensional inference of a rank-one signal corrupted by sparse noise.<n>The noise is modelled as the adjacency matrix of a weighted undirected graph with finite average connectivity in the large size limit.
- Score: 3.7359659592708745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the high-dimensional inference of a rank-one signal corrupted by sparse noise. The noise is modelled as the adjacency matrix of a weighted undirected graph with finite average connectivity in the large size limit. Using the replica method from statistical physics, we analytically compute the typical value of the top eigenvalue, the top eigenvector component density, and the overlap between the signal vector and the top eigenvector. The solution is given in terms of recursive distributional equations for auxiliary probability density functions which can be efficiently solved using a population dynamics algorithm. Specialising the noise matrix to Poissonian and Random Regular degree distributions, the critical signal strength is analytically identified at which a transition happens for the recovery of the signal via the top eigenvector, thus generalising the celebrated BBP transition to the sparse noise case. In the large-connectivity limit, known results for dense noise are recovered. Analytical results are in agreement with numerical diagonalisation of large matrices.
Related papers
- Computational and statistical lower bounds for low-rank estimation under general inhomogeneous noise [3.582937960220228]
We study the computational optimality of a spectral algorithm for a low-rank signal matrix corrupted by additive i.i.d.<n>We do not assume that the variance profile has a block structure, and suggest that the same spectral algorithm might remain optimal for quite general profiles.
arXiv Detail & Related papers (2025-10-09T17:53:59Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Optimal Algorithms for the Inhomogeneous Spiked Wigner Model [89.1371983413931]
We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem.
We identify in particular the existence of a statistical-to-computational gap where known algorithms require a signal-to-noise ratio bigger than the information-theoretic threshold to perform better than random.
arXiv Detail & Related papers (2023-02-13T19:57:17Z) - Detection problems in the spiked matrix models [15.125686694430573]
We first show that the principal component analysis can be improved by entrywise pre-transforming the data matrix if the noise is non-Gaussian.
We also introduce an algorithm that estimates the rank of the signal when it is not known a priori.
arXiv Detail & Related papers (2023-01-12T23:46:41Z) - Bayes-optimal limits in structured PCA, and how to reach them [21.3083877172595]
We study the paradigmatic matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise.
We provide the first characterization of the Bayes-optimal limits of inference in this model.
We propose a novel approximate message passing algorithm (AMP), inspired by the theory of Adaptive Thouless-Anderson-Palmer equations.
arXiv Detail & Related papers (2022-10-03T21:31:41Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Detection of Signal in the Spiked Rectangular Models [8.185918509343818]
We show that the principal component analysis can be improved by pre-transforming the matrix entries if the noise is non-Gaussian.
We also propose a hypothesis test to detect the presence of signal with low computational complexity.
arXiv Detail & Related papers (2021-04-28T01:15:45Z) - Minimax Estimation of Linear Functions of Eigenvectors in the Face of
Small Eigen-Gaps [95.62172085878132]
Eigenvector perturbation analysis plays a vital role in various statistical data science applications.
We develop a suite of statistical theory that characterizes the perturbation of arbitrary linear functions of an unknown eigenvector.
In order to mitigate a non-negligible bias issue inherent to the natural "plug-in" estimator, we develop de-biased estimators.
arXiv Detail & Related papers (2021-04-07T17:55:10Z) - Probabilistic Simplex Component Analysis [66.30587591100566]
PRISM is a probabilistic simplex component analysis approach to identifying the vertices of a data-circumscribing simplex from data.
The problem has a rich variety of applications, the most notable being hyperspectral unmixing in remote sensing and non-negative matrix factorization in machine learning.
arXiv Detail & Related papers (2021-03-18T05:39:00Z) - Estimating Rank-One Spikes from Heavy-Tailed Noise via Self-Avoiding
Walks [13.879536370173506]
We study symmetric spiked matrix models with respect to a general class of noise distributions.
We exhibit an estimator that works for heavy-tailed noise up to the BBP threshold that is optimal even for Gaussian noise.
Our estimator can be evaluated in time by counting self-avoiding walks via a color-coding technique.
arXiv Detail & Related papers (2020-08-31T16:57:20Z) - All-or-nothing statistical and computational phase transitions in sparse
spiked matrix estimation [35.035853993422506]
We analyze the approximate message passing algorithm in a sparse regime.
We find all-or-nothing phase transitions for the minimum and mean-square errors.
In the sparse regime the statistical-to-algorithmic gap diverges indicating that sparse recovery is hard for approximate message passing.
arXiv Detail & Related papers (2020-06-14T18:38:34Z) - Semiparametric Nonlinear Bipartite Graph Representation Learning with
Provable Guarantees [106.91654068632882]
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.
We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate.
Our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
arXiv Detail & Related papers (2020-03-02T16:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.