Hybrid Decentralized Optimization: First- and Zeroth-Order Optimizers
Can Be Jointly Leveraged For Faster Convergence
- URL: http://arxiv.org/abs/2210.07703v1
- Date: Fri, 14 Oct 2022 10:54:11 GMT
- Title: Hybrid Decentralized Optimization: First- and Zeroth-Order Optimizers
Can Be Jointly Leveraged For Faster Convergence
- Authors: Shayan Talaei, Giorgi Nadiradze, Dan Alistarh
- Abstract summary: We study settings where nodes with zeroth-order and first-order optimization capabilities co-exist in a distributed system.
We show that such a system can not only withstand noisier zeroth-order agents but can even benefit from integrating such agents into the optimization process.
- Score: 25.211914690635183
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distributed optimization has become one of the standard ways of speeding up
machine learning training, and most of the research in the area focuses on
distributed first-order, gradient-based methods. Yet, there are settings where
some computationally-bounded nodes may not be able to implement first-order,
gradient-based optimization, while they could still contribute to joint
optimization tasks. In this paper, we initiate the study of hybrid
decentralized optimization, studying settings where nodes with zeroth-order and
first-order optimization capabilities co-exist in a distributed system, and
attempt to jointly solve an optimization task over some data distribution. We
essentially show that, under reasonable parameter settings, such a system can
not only withstand noisier zeroth-order agents but can even benefit from
integrating such agents into the optimization process, rather than ignoring
their information. At the core of our approach is a new analysis of distributed
optimization with noisy and possibly-biased gradient estimators, which may be
of independent interest. Experimental results on standard optimization tasks
confirm our analysis, showing that hybrid first-zeroth order optimization can
be practical.
Related papers
- Primitive Agentic First-Order Optimization [0.0]
This work presents a proof-of-concept study combining primitive state representations and agent-environment interactions as first-order reinforcement learning.
The results show that elementary RL methods combined with succinct partial state representations can be used as optimizeds manage complexity in RL-based optimization.
arXiv Detail & Related papers (2024-06-07T11:13:38Z) - A Continuous Relaxation for Discrete Bayesian Optimization [17.312618575552]
We show that inference and optimization can be computationally tractable.
We consider in particular the optimization domain where very few observations and strict budgets exist.
We show that the resulting acquisition function can be optimized with both continuous or discrete optimization algorithms.
arXiv Detail & Related papers (2024-04-26T14:47:40Z) - Localized Zeroth-Order Prompt Optimization [54.964765668688806]
We propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO)
ZOPO incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization.
Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency.
arXiv Detail & Related papers (2024-03-05T14:18:15Z) - Implicit Diffusion: Efficient Optimization through Stochastic Sampling [46.049117719591635]
We present a new algorithm to optimize distributions defined implicitly by parameterized diffusions.
We introduce a general framework for first-order optimization of these processes, that performs jointly.
We apply it to training energy-based models and finetuning denoising diffusions.
arXiv Detail & Related papers (2024-02-08T08:00:11Z) - Optimistic Optimization of Gaussian Process Samples [30.226274682578172]
A competing, computationally more efficient, global optimization framework is optimistic optimization, which exploits prior knowledge about the geometry of the search space in form of a dissimilarity function.
We argue that there is a new research domain between geometric and probabilistic search, i.e. methods that run drastically faster than traditional Bayesian optimization, while retaining some of the crucial functionality of Bayesian optimization.
arXiv Detail & Related papers (2022-09-02T09:06:24Z) - Optimizer Amalgamation [124.33523126363728]
We are motivated to study a new problem named Amalgamation: how can we best combine a pool of "teacher" amalgamations into a single "student" that can have stronger problem-specific performance?
First, we define three differentiable mechanisms to amalgamate a pool of analyticals by gradient descent.
In order to reduce variance of the process, we also explore methods to stabilize the process by perturbing the target.
arXiv Detail & Related papers (2022-03-12T16:07:57Z) - On Constraints in First-Order Optimization: A View from Non-Smooth
Dynamical Systems [99.59934203759754]
We introduce a class of first-order methods for smooth constrained optimization.
Two distinctive features of our approach are that projections or optimizations over the entire feasible set are avoided.
The resulting algorithmic procedure is simple to implement even when constraints are nonlinear.
arXiv Detail & Related papers (2021-07-17T11:45:13Z) - An Efficient Batch Constrained Bayesian Optimization Approach for Analog
Circuit Synthesis via Multi-objective Acquisition Ensemble [11.64233949999656]
We propose an efficient parallelizable Bayesian optimization algorithm via Multi-objective ACquisition function Ensemble (MACE)
Our proposed algorithm can reduce the overall simulation time by up to 74 times compared to differential evolution (DE) for the unconstrained optimization problem when the batch size is 15.
For the constrained optimization problem, our proposed algorithm can speed up the optimization process by up to 15 times compared to the weighted expected improvement based Bayesian optimization (WEIBO) approach, when the batch size is 15.
arXiv Detail & Related papers (2021-06-28T13:21:28Z) - A Primer on Zeroth-Order Optimization in Signal Processing and Machine
Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update.
We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z) - Incorporating Expert Prior in Bayesian Optimisation via Space Warping [54.412024556499254]
In big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function.
One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation.
In this paper, we represent the prior knowledge about the function optimum through a prior distribution.
The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum.
arXiv Detail & Related papers (2020-03-27T06:18:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.