Leveraging Non-uniformity in First-order Non-convex Optimization
- URL: http://arxiv.org/abs/2105.06072v1
- Date: Thu, 13 May 2021 04:23:07 GMT
- Title: Leveraging Non-uniformity in First-order Non-convex Optimization
- Authors: Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, Dale Schuurmans
- Abstract summary: Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
- Score: 93.6817946818977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classical global convergence results for first-order methods rely on uniform
smoothness and the \L{}ojasiewicz inequality. Motivated by properties of
objective functions that arise in machine learning, we propose a non-uniform
refinement of these notions, leading to \emph{Non-uniform Smoothness} (NS) and
\emph{Non-uniform \L{}ojasiewicz inequality} (N\L{}). The new definitions
inspire new geometry-aware first-order methods that are able to converge to
global optimality faster than the classical $\Omega(1/t^2)$ lower bounds. To
illustrate the power of these geometry-aware methods and their corresponding
non-uniform analysis, we consider two important problems in machine learning:
policy gradient optimization in reinforcement learning (PG), and generalized
linear model training in supervised learning (GLM). For PG, we find that
normalizing the gradient ascent method can accelerate convergence to
$O(e^{-t})$ while incurring less overhead than existing algorithms. For GLM, we
show that geometry-aware normalized gradient descent can also achieve a linear
convergence rate, which significantly improves the best known results. We
additionally show that the proposed geometry-aware descent methods escape
landscape plateaus faster than standard gradient descent. Experimental results
are used to illustrate and complement the theoretical findings.
Related papers
- Extended convexity and smoothness and their applications in deep learning [0.0]
In this paper, we introduce the $mathcal$H$smoothness algorithm for non-completely understood gradient and strong convexity.
The effectiveness of the proposed methodology is validated through experiments.
arXiv Detail & Related papers (2024-10-08T08:40:07Z) - A Methodology Establishing Linear Convergence of Adaptive Gradient Methods under PL Inequality [5.35599092568615]
We show that AdaGrad and Adam converge linearly when the cost function is smooth and satisfies the Polyak-Lojasiewicz inequality.
Our framework can potentially be utilized in analyzing linear convergence of other variants of Adam.
arXiv Detail & Related papers (2024-07-17T14:56:21Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite
Epochs [25.158203665218164]
We show that adaptive gradient methods can be faster than random shuffling SGD after finite time.
To the best of our knowledge, it is the first to demonstrate that adaptive gradient methods can be faster than SGD after finite time.
arXiv Detail & Related papers (2020-06-12T09:39:47Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.