Adaptive Proximal Gradient Method for Convex Optimization
- URL: http://arxiv.org/abs/2308.02261v2
- Date: Mon, 12 Feb 2024 16:05:24 GMT
- Title: Adaptive Proximal Gradient Method for Convex Optimization
- Authors: Yura Malitsky, Konstantin Mishchenko
- Abstract summary: We explore two fundamental first-order algorithms in convex optimization, namely gradient descent (GD) and proximal gradient method (ProxGD)
Our focus is on making these algorithms entirely adaptive by leveraging local curvature information of smooth functions.
We propose adaptive versions of GD and ProxGD that are based on observed gradient differences and, thus, have no added computational costs.
- Score: 18.681222155879656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we explore two fundamental first-order algorithms in convex
optimization, namely, gradient descent (GD) and proximal gradient method
(ProxGD). Our focus is on making these algorithms entirely adaptive by
leveraging local curvature information of smooth functions. We propose adaptive
versions of GD and ProxGD that are based on observed gradient differences and,
thus, have no added computational costs. Moreover, we prove convergence of our
methods assuming only local Lipschitzness of the gradient. In addition, the
proposed versions allow for even larger stepsizes than those initially
suggested in [MM20].
Related papers
- Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Adaptive proximal algorithms for convex optimization under local
Lipschitz continuity of the gradient [4.478941279527423]
Backtracking linesearch is the de facto approach for minimizing continuously differentiable functions with locally Lipschitz gradient.
In recent years, it has been shown that in the convex setting it is possible to avoid linesearch altogether.
We propose an adaptive proximal gradient method, adaPG, that uses novel estimates of the local smoothness modulus.
arXiv Detail & Related papers (2023-01-11T12:19:28Z) - Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive
Step Size [29.15132344744801]
We establish local convergence for gradient descent with adaptive step size for problems such as matrix inversion.
We show that these first order optimization methods can achieve sub-linear or linear convergence.
arXiv Detail & Related papers (2021-12-30T00:50:30Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Projection-Free Adaptive Gradients for Large-Scale Optimization [22.0439695290991]
Frank-Wolfe algorithms occupy a unique position as they alleviate both computational burdens by querying only approximate first-order information from the objective.
We show that our method can improve the performance of adaptive algorithms for constrained optimization.
arXiv Detail & Related papers (2020-09-29T15:56:12Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.