Computationally Efficient and Statistically Optimal Robust Low-rank
Matrix and Tensor Estimation
- URL: http://arxiv.org/abs/2203.00953v4
- Date: Thu, 11 May 2023 01:20:06 GMT
- Title: Computationally Efficient and Statistically Optimal Robust Low-rank
Matrix and Tensor Estimation
- Authors: Yinan Shen and Jingyang Li and Jian-Feng Cai and Dong Xia
- Abstract summary: Low-rank linear shrinkage estimation undertailed noise is challenging, both computationally statistically.
We introduce a novel sub-ient (RsGrad) which is not only computationally efficient but also statistically optimal.
- Score: 15.389011827844572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Low-rank matrix estimation under heavy-tailed noise is challenging, both
computationally and statistically. Convex approaches have been proven
statistically optimal but suffer from high computational costs, especially
since robust loss functions are usually non-smooth. More recently,
computationally fast non-convex approaches via sub-gradient descent are
proposed, which, unfortunately, fail to deliver a statistically consistent
estimator even under sub-Gaussian noise. In this paper, we introduce a novel
Riemannian sub-gradient (RsGrad) algorithm which is not only computationally
efficient with linear convergence but also is statistically optimal, be the
noise Gaussian or heavy-tailed. Convergence theory is established for a general
framework and specific applications to absolute loss, Huber loss, and quantile
loss are investigated. Compared with existing non-convex methods, ours reveals
a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves
as in a typical non-smooth optimization that requires gradually decaying
stepsizes. However, phase one only delivers a statistically sub-optimal
estimator which is already observed in the existing literature. Interestingly,
during phase two, RsGrad converges linearly as if minimizing a smooth and
strongly convex objective function and thus a constant stepsize suffices.
Underlying the phase-two convergence is the smoothing effect of random noise to
the non-smooth robust losses in an area close but not too close to the truth.
Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed
noise where a statistically optimal rate is attainable with the same phenomenon
of dual-phase convergence, and a novel shrinkage-based second-order moment
method is guaranteed to deliver a warm initialization. Numerical simulations
confirm our theoretical discovery and showcase the superiority of RsGrad over
prior methods.
Related papers
- Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Online Bootstrap Inference with Nonconvex Stochastic Gradient Descent
Estimator [0.0]
In this paper, we investigate the theoretical properties of gradient descent (SGD) for statistical inference in the context of convex problems.
We propose two coferential procedures which may contain multiple error minima.
arXiv Detail & Related papers (2023-06-03T22:08:10Z) - Computationally Efficient and Statistically Optimal Robust
High-Dimensional Linear Regression [15.389011827844572]
High-tailed linear regression under heavy-tailed noise or objective corruption is challenging, both computationally statistically.
In this paper, we introduce an algorithm for both the noise Gaussian or heavy 1 + epsilon regression problems.
arXiv Detail & Related papers (2023-05-10T14:31:03Z) - Robust Matrix Completion with Heavy-tailed Noise [0.5837881923712392]
This paper studies low-rank matrix completion in the presence of heavy-tailed possibly asymmetric noise.
In this paper, we adopt adaptive Huber loss accommodate heavy-tailed noise, which is robust against large and possibly asymmetric errors.
We prove that under merely a second moment condition on the error, the Euclidean error falls geometrically fast until achieving a minimax-optimal statistical estimation error.
arXiv Detail & Related papers (2022-06-09T04:48:48Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.