Parameter-free version of Adaptive Gradient Methods for Strongly-Convex
Functions
- URL: http://arxiv.org/abs/2306.06613v2
- Date: Sat, 15 Jul 2023 03:21:29 GMT
- Title: Parameter-free version of Adaptive Gradient Methods for Strongly-Convex
Functions
- Authors: Deepak Gouda, Hassan Naveed, Salil Kamath
- Abstract summary: In this paper, we adapt a universal algorithm along the lines of Metagrad, to get rid of this dependence on lambda and eta.
The main idea is to concurrently run multiple experts and combine their predictions to a master algorithm.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The optimal learning rate for adaptive gradient methods applied to
{\lambda}-strongly convex functions relies on the parameters {\lambda} and
learning rate {\eta}. In this paper, we adapt a universal algorithm along the
lines of Metagrad, to get rid of this dependence on {\lambda} and {\eta}. The
main idea is to concurrently run multiple experts and combine their predictions
to a master algorithm. This master enjoys O(d log T) regret bounds.
Related papers
- ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Efficient Lipschitzian Global Optimization of H\"older Continuous
Multivariate Functions [0.0]
This study presents an effective global optimization technique designed for multivariate functions that are H"older continuous.
We show that the algorithm attains an average regret bound of $O(T-fracalphan)$ for optimizing a H"older continuous target function with H"older $alpha$ in an $n$-dimensional space within a given time horizon.
arXiv Detail & Related papers (2023-03-24T22:29:35Z) - Optimization using Parallel Gradient Evaluations on Multiple Parameters [51.64614793990665]
We propose a first-order method for convex optimization, where gradients from multiple parameters can be used during each step of gradient descent.
Our method uses gradients from multiple parameters in synergy to update these parameters together towards the optima.
arXiv Detail & Related papers (2023-02-06T23:39:13Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Adaptive extra-gradient methods for min-max optimization and games [35.02879452114223]
We present a new family of minmax optimization algorithms that automatically exploit the geometry of the gradient data observed at earlier iterations.
Thanks to this adaptation mechanism, the proposed method automatically detects whether the problem is smooth or not.
It converges to an $varepsilon$-optimal solution within $mathcalO (1/varepsilon)$ iterations in smooth problems, and within $mathcalO (1/varepsilon)$ iterations in non-smooth ones.
arXiv Detail & Related papers (2020-10-22T22:54:54Z) - Adaptive Gradient Methods for Constrained Convex Optimization and
Variational Inequalities [32.51470158863247]
Main algorithms AdaACSA and AdaAGD+ are accelerated methods for constrained convex optimization.
We complement them with a simpler algorithm AdaGrad+ which enjoys the same features, and achieves the standard non-accelerated convergence rate.
arXiv Detail & Related papers (2020-07-17T09:10:21Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.