Computationally Efficient and Statistically Optimal Robust
High-Dimensional Linear Regression
- URL: http://arxiv.org/abs/2305.06199v1
- Date: Wed, 10 May 2023 14:31:03 GMT
- Title: Computationally Efficient and Statistically Optimal Robust
High-Dimensional Linear Regression
- Authors: Yinan Shen, Jingyang Li, Jian-Feng Cai, Dong Xia
- Abstract summary: High-tailed linear regression under heavy-tailed noise or objective corruption is challenging, both computationally statistically.
In this paper, we introduce an algorithm for both the noise Gaussian or heavy 1 + epsilon regression problems.
- Score: 15.389011827844572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-dimensional linear regression under heavy-tailed noise or outlier
corruption is challenging, both computationally and statistically. Convex
approaches have been proven statistically optimal but suffer from high
computational costs, especially since the robust loss functions are usually
non-smooth. More recently, computationally fast non-convex approaches via
sub-gradient descent are proposed, which, unfortunately, fail to deliver a
statistically consistent estimator even under sub-Gaussian noise. In this
paper, we introduce a projected sub-gradient descent algorithm for both the
sparse linear regression and low-rank linear regression problems. The algorithm
is not only computationally efficient with linear convergence but also
statistically optimal, be the noise Gaussian or heavy-tailed with a finite 1 +
epsilon moment. The convergence theory is established for a general framework
and its specific applications to absolute loss, Huber loss and quantile loss
are investigated. Compared with existing non-convex methods, ours reveals a
surprising phenomenon of two-phase convergence. In phase one, the algorithm
behaves as in typical non-smooth optimization that requires gradually decaying
stepsizes. However, phase one only delivers a statistically sub-optimal
estimator, which is already observed in the existing literature. Interestingly,
during phase two, the algorithm converges linearly as if minimizing a smooth
and strongly convex objective function, and thus a constant stepsize suffices.
Underlying the phase-two convergence is the smoothing effect of random noise to
the non-smooth robust losses in an area close but not too close to the truth.
Numerical simulations confirm our theoretical discovery and showcase the
superiority of our algorithm over prior methods.
Related papers
- Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm
Regularization [44.54772242784423]
We develop an efficient non regularization algorithm for low-rank learning matrix.
The proposed algorithm can avoid expensive folding/unfolding problems.
Experiments show that the proposed algorithm is efficient and more space than the existing state-of-the-world.
arXiv Detail & Related papers (2022-05-06T07:47:10Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Computationally Efficient and Statistically Optimal Robust Low-rank
Matrix and Tensor Estimation [15.389011827844572]
Low-rank linear shrinkage estimation undertailed noise is challenging, both computationally statistically.
We introduce a novel sub-ient (RsGrad) which is not only computationally efficient but also statistically optimal.
arXiv Detail & Related papers (2022-03-02T09:05:15Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Provably Convergent Working Set Algorithm for Non-Convex Regularized
Regression [0.0]
This paper proposes a working set algorithm for non-regular regularizers with convergence guarantees.
Our results demonstrate high gain compared to the full problem solver for both block-coordinates or a gradient solver.
arXiv Detail & Related papers (2020-06-24T07:40:31Z) - The estimation error of general first order methods [12.472245917779754]
We consider two families of estimation problems: high-dimensional regression and low-dimensional matrix estimation.
We derive lower bounds the error that hold in the high-dimensional optimals in which both the number of observations and the number of parameters diverge.
These lower bounds sense that there exist algorithms whose estimation error matches the lower bounds up to sparseally negligible terms.
arXiv Detail & Related papers (2020-02-28T18:13:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.