Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets
- URL: http://arxiv.org/abs/1912.11940v2
- Date: Fri, 25 Dec 2020 02:17:20 GMT
- Title: Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets
- Authors: Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei Zhang, Xiaodong Cui,
Payel Das, Tianbao Yang
- Abstract summary: Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
- Score: 71.05306664267832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adaptive gradient algorithms perform gradient-based updates using the history
of gradients and are ubiquitous in training deep neural networks. While
adaptive gradient methods theory is well understood for minimization problems,
the underlying factors driving their empirical success in min-max problems such
as GANs remain unclear. In this paper, we aim at bridging this gap from both
theoretical and empirical perspectives. First, we analyze a variant of
Optimistic Stochastic Gradient (OSG) proposed in~\citep{daskalakis2017training}
for solving a class of non-convex non-concave min-max problem and establish
$O(\epsilon^{-4})$ complexity for finding $\epsilon$-first-order stationary
point, in which the algorithm only requires invoking one stochastic first-order
oracle while enjoying state-of-the-art iteration complexity achieved by
stochastic extragradient method by~\citep{iusem2017extragradient}. Then we
propose an adaptive variant of OSG named Optimistic Adagrad (OAdagrad) and
reveal an \emph{improved} adaptive complexity
$O\left(\epsilon^{-\frac{2}{1-\alpha}}\right)$, where $\alpha$ characterizes
the growth rate of the cumulative stochastic gradient and $0\leq \alpha\leq
1/2$. To the best of our knowledge, this is the first work for establishing
adaptive complexity in non-convex non-concave min-max optimization.
Empirically, our experiments show that indeed adaptive gradient algorithms
outperform their non-adaptive counterparts in GAN training. Moreover, this
observation can be explained by the slow growth rate of the cumulative
stochastic gradient, as observed empirically.
Related papers
- An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper investigates a class of bilevel optimization problems where the upper-level function is non- unbounded smoothness and the lower-level problem is strongly convex.
These problems have significant applications in data learning, such as text classification using neural networks.
arXiv Detail & Related papers (2024-09-28T02:30:44Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and
Nonsmooth Bi-level Optimization [26.68351521813062]
Existing bi-level algorithms cannot handle nonsmooth or hyper-smooth regularizers.
In this paper, we show that an implicit differentiation (AID) scheme can be used to speed up comprehensive machine learning applications.
arXiv Detail & Related papers (2022-03-30T18:53:04Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Convergence of First-Order Methods for Constrained Nonconvex
Optimization with Dependent Data [7.513100214864646]
We show the worst-case complexity of convergence $tildeO(t-1/4)$ and MoreautildeO(vareps-4)$ for smooth non- optimization.
We obtain first online nonnegative matrix factorization algorithms for dependent data based on projected gradient methods with adaptive step sizes and optimal convergence.
arXiv Detail & Related papers (2022-03-29T17:59:10Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite
Epochs [25.158203665218164]
We show that adaptive gradient methods can be faster than random shuffling SGD after finite time.
To the best of our knowledge, it is the first to demonstrate that adaptive gradient methods can be faster than SGD after finite time.
arXiv Detail & Related papers (2020-06-12T09:39:47Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.