Related papers: Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation

Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation

URL: http://arxiv.org/abs/2203.00953v4
Date: Thu, 11 May 2023 01:20:06 GMT
Title: Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation
Authors: Yinan Shen and Jingyang Li and Jian-Feng Cai and Dong Xia
Abstract summary: Low-rank linear shrinkage estimation undertailed noise is challenging, both computationally statistically. We introduce a novel sub-ient (RsGrad) which is not only computationally efficient but also statistically optimal.
Score: 15.389011827844572
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.

Related papers

Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z)
Online Bootstrap Inference with Nonconvex Stochastic Gradient Descent Estimator [0.0]
In this paper, we investigate the theoretical properties of gradient descent (SGD) for statistical inference in the context of convex problems. We propose two coferential procedures which may contain multiple error minima.
arXiv Detail & Related papers (2023-06-03T22:08:10Z)
Computationally Efficient and Statistically Optimal Robust High-Dimensional Linear Regression [15.389011827844572]
High-tailed linear regression under heavy-tailed noise or objective corruption is challenging, both computationally statistically. In this paper, we introduce an algorithm for both the noise Gaussian or heavy 1 + epsilon regression problems.
arXiv Detail & Related papers (2023-05-10T14:31:03Z)
Robust Matrix Completion with Heavy-tailed Noise [0.5837881923712392]
This paper studies low-rank matrix completion in the presence of heavy-tailed possibly asymmetric noise. In this paper, we adopt adaptive Huber loss accommodate heavy-tailed noise, which is robust against large and possibly asymmetric errors. We prove that under merely a second moment condition on the error, the Euclidean error falls geometrically fast until achieving a minimax-optimal statistical estimation error.
arXiv Detail & Related papers (2022-06-09T04:48:48Z)
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method. We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.