Robust Matrix Completion with Heavy-tailed Noise
- URL: http://arxiv.org/abs/2206.04276v1
- Date: Thu, 9 Jun 2022 04:48:48 GMT
- Title: Robust Matrix Completion with Heavy-tailed Noise
- Authors: Bingyan Wang, Jianqing Fan
- Abstract summary: This paper studies low-rank matrix completion in the presence of heavy-tailed possibly asymmetric noise.
In this paper, we adopt adaptive Huber loss accommodate heavy-tailed noise, which is robust against large and possibly asymmetric errors.
We prove that under merely a second moment condition on the error, the Euclidean error falls geometrically fast until achieving a minimax-optimal statistical estimation error.
- Score: 0.5837881923712392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies low-rank matrix completion in the presence of heavy-tailed
and possibly asymmetric noise, where we aim to estimate an underlying low-rank
matrix given a set of highly incomplete noisy entries. Though the matrix
completion problem has attracted much attention in the past decade, there is
still lack of theoretical understanding when the observations are contaminated
by heavy-tailed noises. Prior theory falls short of explaining the empirical
results and is unable to capture the optimal dependence of the estimation error
on the noise level. In this paper, we adopt an adaptive Huber loss to
accommodate heavy-tailed noise, which is robust against large and possibly
asymmetric errors when the parameter in the loss function is carefully designed
to balance the Huberization biases and robustness to outliers. Then, we propose
an efficient nonconvex algorithm via a balanced low-rank Burer-Monteiro matrix
factorization and gradient decent with robust spectral initialization. We prove
that under merely bounded second moment condition on the error distributions,
rather than the sub-Gaussian assumption, the Euclidean error of the iterates
generated by the proposed algorithm decrease geometrically fast until achieving
a minimax-optimal statistical estimation error, which has the same order as
that in the sub-Gaussian case. The key technique behind this significant
advancement is a powerful leave-one-out analysis framework. The theoretical
results are corroborated by our simulation studies.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Matrix Denoising with Doubly Heteroscedastic Noise: Fundamental Limits and Optimal Spectral Methods [24.06775799553418]
We study the matrix denoising problem of estimating the singular vectors of a rank-$1$ signal corrupted by noise with both column and row correlations.
Our work establishes the information-theoretic and algorithmic limits of matrix denoising with doubly heteroscedastic noise.
arXiv Detail & Related papers (2024-05-22T18:38:10Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - The price of ignorance: how much does it cost to forget noise structure
in low-rank matrix estimation? [21.3083877172595]
We consider the problem of estimating a rank-1 signal corrupted by structured rotationally invariant noise.
We make a step towards understanding the effect of the strong source of mismatch which is the noise statistics.
We show that this performance gap is due to an incorrect estimation of the signal norm.
arXiv Detail & Related papers (2022-05-20T07:54:21Z) - Computationally Efficient and Statistically Optimal Robust Low-rank
Matrix and Tensor Estimation [15.389011827844572]
Low-rank linear shrinkage estimation undertailed noise is challenging, both computationally statistically.
We introduce a novel sub-ient (RsGrad) which is not only computationally efficient but also statistically optimal.
arXiv Detail & Related papers (2022-03-02T09:05:15Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - On the Role of Entropy-based Loss for Learning Causal Structures with
Continuous Optimization [27.613220411996025]
A method with non-combinatorial directed acyclic constraint, called NOTEARS, formulates the causal structure learning problem as a continuous optimization problem using least-square loss.
We show that the violation of the Gaussian noise assumption will hinder the causal direction identification.
We propose a more general entropy-based loss that is theoretically consistent with the likelihood score under any noise distribution.
arXiv Detail & Related papers (2021-06-05T08:29:51Z) - Robust Compressed Sensing using Generative Models [98.64228459705859]
In this paper we propose an algorithm inspired by the Median-of-Means (MOM)
Our algorithm guarantees recovery for heavy-tailed data, even in the presence of outliers.
arXiv Detail & Related papers (2020-06-16T19:07:41Z) - Bridging Convex and Nonconvex Optimization in Robust PCA: Noise,
Outliers, and Missing Data [20.31071912733189]
This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation.
We show that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the $ell_infty$ loss.
arXiv Detail & Related papers (2020-01-15T18:54:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.