The estimation error of general first order methods
- URL: http://arxiv.org/abs/2002.12903v2
- Date: Tue, 3 Mar 2020 17:44:58 GMT
- Title: The estimation error of general first order methods
- Authors: Michael Celentano, Andrea Montanari, Yuchen Wu
- Abstract summary: We consider two families of estimation problems: high-dimensional regression and low-dimensional matrix estimation.
We derive lower bounds the error that hold in the high-dimensional optimals in which both the number of observations and the number of parameters diverge.
These lower bounds sense that there exist algorithms whose estimation error matches the lower bounds up to sparseally negligible terms.
- Score: 12.472245917779754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern large-scale statistical models require to estimate thousands to
millions of parameters. This is often accomplished by iterative algorithms such
as gradient descent, projected gradient descent or their accelerated versions.
What are the fundamental limits to these approaches? This question is well
understood from an optimization viewpoint when the underlying objective is
convex. Work in this area characterizes the gap to global optimality as a
function of the number of iterations. However, these results have only indirect
implications in terms of the gap to statistical optimality.
Here we consider two families of high-dimensional estimation problems:
high-dimensional regression and low-rank matrix estimation, and introduce a
class of `general first order methods' that aim at efficiently estimating the
underlying parameters. This class of algorithms is broad enough to include
classical first order optimization (for convex and non-convex objectives), but
also other types of algorithms. Under a random design assumption, we derive
lower bounds on the estimation error that hold in the high-dimensional
asymptotics in which both the number of observations and the number of
parameters diverge. These lower bounds are optimal in the sense that there
exist algorithms whose estimation error matches the lower bounds up to
asymptotically negligible terms. We illustrate our general results through
applications to sparse phase retrieval and sparse principal component analysis.
Related papers
- Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Computationally Efficient and Statistically Optimal Robust
High-Dimensional Linear Regression [15.389011827844572]
High-tailed linear regression under heavy-tailed noise or objective corruption is challenging, both computationally statistically.
In this paper, we introduce an algorithm for both the noise Gaussian or heavy 1 + epsilon regression problems.
arXiv Detail & Related papers (2023-05-10T14:31:03Z) - Stochastic Mirror Descent for Large-Scale Sparse Recovery [13.500750042707407]
We discuss an application of quadratic Approximation to statistical estimation of high-dimensional sparse parameters.
We show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution.
arXiv Detail & Related papers (2022-10-23T23:23:23Z) - Sharp global convergence guarantees for iterative nonconvex
optimization: A Gaussian process perspective [30.524043513721168]
We develop a general recipe for analyzing the convergence of iterative algorithms for a class of regression models.
deterministicly, we accurately capture both the convergence rate of the algorithm and the eventual error floor in the finite-sample regime.
We show sharp convergence rates for both higher-order algorithms based on alternating updates and first-order algorithms based on subgradient subgradients.
arXiv Detail & Related papers (2021-09-20T21:48:19Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Stochastic Optimization for Regularized Wasserstein Estimators [10.194798773447879]
We introduce an algorithm to solve a regularized version of the problem of Wasserstein estimators gradient, with a time per step which is sublinear in the natural dimensions.
We show that this algorithm can be extended to other tasks, including estimation of Wasserstein barycenters.
arXiv Detail & Related papers (2020-02-20T12:04:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.