Sharp global convergence guarantees for iterative nonconvex
optimization: A Gaussian process perspective
- URL: http://arxiv.org/abs/2109.09859v1
- Date: Mon, 20 Sep 2021 21:48:19 GMT
- Title: Sharp global convergence guarantees for iterative nonconvex
optimization: A Gaussian process perspective
- Authors: Kabir Aladin Chandrasekher, Ashwin Pananjady, Christos Thrampoulidis
- Abstract summary: We develop a general recipe for analyzing the convergence of iterative algorithms for a class of regression models.
deterministicly, we accurately capture both the convergence rate of the algorithm and the eventual error floor in the finite-sample regime.
We show sharp convergence rates for both higher-order algorithms based on alternating updates and first-order algorithms based on subgradient subgradients.
- Score: 30.524043513721168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a general class of regression models with normally distributed
covariates, and the associated nonconvex problem of fitting these models from
data. We develop a general recipe for analyzing the convergence of iterative
algorithms for this task from a random initialization. In particular, provided
each iteration can be written as the solution to a convex optimization problem
satisfying some natural conditions, we leverage Gaussian comparison theorems to
derive a deterministic sequence that provides sharp upper and lower bounds on
the error of the algorithm with sample-splitting. Crucially, this deterministic
sequence accurately captures both the convergence rate of the algorithm and the
eventual error floor in the finite-sample regime, and is distinct from the
commonly used "population" sequence that results from taking the
infinite-sample limit. We apply our general framework to derive several
concrete consequences for parameter estimation in popular statistical models
including phase retrieval and mixtures of regressions. Provided the sample size
scales near-linearly in the dimension, we show sharp global convergence rates
for both higher-order algorithms based on alternating updates and first-order
algorithms based on subgradient descent. These corollaries, in turn, yield
multiple consequences, including: (a) Proof that higher-order algorithms can
converge significantly faster than their first-order counterparts (and
sometimes super-linearly), even if the two share the same population update and
(b) Intricacies in super-linear convergence behavior for higher-order
algorithms, which can be nonstandard (e.g., with exponent 3/2) and sensitive to
the noise level in the problem. We complement these results with extensive
numerical experiments, which show excellent agreement with our theoretical
predictions.
Related papers
- Last Iterate Convergence of Incremental Methods and Applications in Continual Learning [10.811735264028348]
Motivated by applications in continual learning, we obtain convergence guarantees for the last iterate of both incremental gradient and incremental proximal methods.
We study incremental proximal methods as a model of continual learning with generalization and argue that large amount of regularization is crucial to preventing catastrophic forgetting.
arXiv Detail & Related papers (2024-03-11T16:24:26Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Robust empirical risk minimization via Newton's method [9.797319790710711]
A new variant of Newton's method for empirical risk minimization is studied.
The gradient and Hessian of the objective function are replaced by robust estimators.
An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed.
arXiv Detail & Related papers (2023-01-30T18:54:54Z) - Stochastic Mirror Descent for Large-Scale Sparse Recovery [13.500750042707407]
We discuss an application of quadratic Approximation to statistical estimation of high-dimensional sparse parameters.
We show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution.
arXiv Detail & Related papers (2022-10-23T23:23:23Z) - Alternating minimization for generalized rank one matrix sensing: Sharp predictions from a random initialization [5.900674344455754]
We show a technique for estimating properties of a rank random matrix with i.i.d.
We show sharp convergence guarantees exact recovery in a single step.
Our analysis also exposes several other properties of this problem.
arXiv Detail & Related papers (2022-07-20T05:31:05Z) - First-Order Algorithms for Nonlinear Generalized Nash Equilibrium
Problems [88.58409977434269]
We consider the problem of computing an equilibrium in a class of nonlinear generalized Nash equilibrium problems (NGNEPs)
Our contribution is to provide two simple first-order algorithmic frameworks based on the quadratic penalty method and the augmented Lagrangian method.
We provide nonasymptotic theoretical guarantees for these algorithms.
arXiv Detail & Related papers (2022-04-07T00:11:05Z) - Information-Theoretic Generalization Bounds for Iterative
Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles.
Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Provably Convergent Working Set Algorithm for Non-Convex Regularized
Regression [0.0]
This paper proposes a working set algorithm for non-regular regularizers with convergence guarantees.
Our results demonstrate high gain compared to the full problem solver for both block-coordinates or a gradient solver.
arXiv Detail & Related papers (2020-06-24T07:40:31Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.