Stochasticity helps to navigate rough landscapes: comparing
gradient-descent-based algorithms in the phase retrieval problem
- URL: http://arxiv.org/abs/2103.04902v1
- Date: Mon, 8 Mar 2021 17:06:18 GMT
- Title: Stochasticity helps to navigate rough landscapes: comparing
gradient-descent-based algorithms in the phase retrieval problem
- Authors: Francesca Mignacco, Pierfrancesco Urbani, Lenka Zdeborov\'a
- Abstract summary: We investigate how analytically-based algorithms such as dynamical descent, its persistent gradient, and Langevin landscape descent can be studied.
We apply statistical-field theory from statistical trajectories to these algorithms in their full-time limit, with a start, and for large system sizes.
- Score: 8.164433158925593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we investigate how gradient-based algorithms such as gradient
descent, (multi-pass) stochastic gradient descent, its persistent variant, and
the Langevin algorithm navigate non-convex losslandscapes and which of them is
able to reach the best generalization error at limited sample complexity. We
consider the loss landscape of the high-dimensional phase retrieval problem as
a prototypical highly non-convex example. We observe that for phase retrieval
the stochastic variants of gradient descent are able to reach perfect
generalization for regions of control parameters where the gradient descent
algorithm is not. We apply dynamical mean-field theory from statistical physics
to characterize analytically the full trajectories of these algorithms in their
continuous-time limit, with a warm start, and for large system sizes. We
further unveil several intriguing properties of the landscape and the
algorithms such as that the gradient descent can obtain better generalization
properties from less informed initializations.
Related papers
- Forward Gradient-Based Frank-Wolfe Optimization for Memory Efficient Deep Neural Network Training [0.0]
This paper focuses on analyzing the performance of the well-known Frank-Wolfe algorithm.
We show the proposed algorithm does converge to the optimal solution with a sub-linear rate of convergence.
In contrast, the standard Frank-Wolfe algorithm, when provided with access to the Projected Forward Gradient, fails to converge to the optimal solution.
arXiv Detail & Related papers (2024-03-19T07:25:36Z) - One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
It is based on the projected gradient descent on the log-likelihood function corrected by a single step of the Fisher scoring algorithm.
We show theoretically and by simulations that it is an interesting alternative to the usual gradient descent with averaging or the adaptative gradient descent.
arXiv Detail & Related papers (2023-06-09T13:43:07Z) - Non asymptotic analysis of Adaptive stochastic gradient algorithms and
applications [0.0]
This paper is devoted to the non analyis of these adaptive gradient algorithms for strongly convex objective.
All the theoretical results will be adapted to linear regression and regularized generalized linear model for both Adagrad algorithm and Newton algorithms.
arXiv Detail & Related papers (2023-03-01T07:36:03Z) - Stability vs Implicit Bias of Gradient Methods on Separable Data and
Beyond [33.593203156666746]
We focus on the generalization properties of unregularized gradient-based learning procedures applied to separable linear classification.
We give an additional unified explanation for this generalization, that we refer to as realizability and self-boundedness.
In some of these cases, our bounds significantly improve upon the existing generalization error bounds in the literature.
arXiv Detail & Related papers (2022-02-27T19:56:36Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm.
In this paper, we remove the necessity of a full gradient.
The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Dynamical mean-field theory for stochastic gradient descent in Gaussian
mixture classification [25.898873960635534]
We analyze in a closed learning dynamics of gradient descent (SGD) for a single-layer neural network classifying a high-dimensional landscape.
We define a prototype process for which can be extended to a continuous-dimensional gradient flow.
In the full-batch limit, we recover the standard gradient flow.
arXiv Detail & Related papers (2020-06-10T22:49:41Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.