Optimization using Parallel Gradient Evaluations on Multiple Parameters
- URL: http://arxiv.org/abs/2302.03161v1
- Date: Mon, 6 Feb 2023 23:39:13 GMT
- Title: Optimization using Parallel Gradient Evaluations on Multiple Parameters
- Authors: Yash Chandak, Shiv Shankar, Venkata Gandikota, Philip S. Thomas, Arya
Mazumdar
- Abstract summary: We propose a first-order method for convex optimization, where gradients from multiple parameters can be used during each step of gradient descent.
Our method uses gradients from multiple parameters in synergy to update these parameters together towards the optima.
- Score: 51.64614793990665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a first-order method for convex optimization, where instead of
being restricted to the gradient from a single parameter, gradients from
multiple parameters can be used during each step of gradient descent. This
setup is particularly useful when a few processors are available that can be
used in parallel for optimization. Our method uses gradients from multiple
parameters in synergy to update these parameters together towards the optima.
While doing so, it is ensured that the computational and memory complexity is
of the same order as that of gradient descent. Empirical results demonstrate
that even using gradients from as low as \textit{two} parameters, our method
can often obtain significant acceleration and provide robustness to
hyper-parameter settings. We remark that the primary goal of this work is less
theoretical, and is instead aimed at exploring the understudied case of using
multiple gradients during each step of optimization.
Related papers
- Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients [0.08388591755871733]
Forward gradients are an approach to approximate the gradients from directional derivatives along random tangents computed by forward-mode automatic differentiation.
This paper provides an in-depth analysis of multi-tangent forward gradients and introduces an improved approach to combining the forward gradients from multiple tangents based on projections.
arXiv Detail & Related papers (2024-10-23T11:02:59Z) - Multi-fidelity Constrained Optimization for Stochastic Black Box
Simulators [1.6385815610837167]
We introduce the algorithm Scout-Nd (Stochastic Constrained Optimization for N dimensions) to tackle the issues mentioned earlier.
Scout-Nd efficiently estimates the gradient, reduces the noise of the estimator gradient, and applies multi-fidelity schemes to further reduce computational effort.
We validate our approach on standard benchmarks, demonstrating its effectiveness in optimizing parameters highlighting better performance compared to existing methods.
arXiv Detail & Related papers (2023-11-25T23:36:38Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Online Hyperparameter Meta-Learning with Hypergradient Distillation [59.973770725729636]
gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization.
We propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation.
arXiv Detail & Related papers (2021-10-06T05:14:53Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Gradient-based Hyperparameter Optimization Over Long Horizons [2.28438857884398]
Forward-mode differentiation with sharing (FDS) is a simple and efficient algorithm which tackles memory scaling issues with forward-mode differentiation.
We demonstrate its efficiency empirically by differentiating through $sim 104$ gradient steps of unrolled optimization.
arXiv Detail & Related papers (2020-07-15T17:44:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.