Optimizer Amalgamation
- URL: http://arxiv.org/abs/2203.06474v2
- Date: Tue, 15 Mar 2022 01:17:23 GMT
- Title: Optimizer Amalgamation
- Authors: Tianshu Huang, Tianlong Chen, Sijia Liu, Shiyu Chang, Lisa Amini,
Zhangyang Wang
- Abstract summary: We are motivated to study a new problem named Amalgamation: how can we best combine a pool of "teacher" amalgamations into a single "student" that can have stronger problem-specific performance?
First, we define three differentiable mechanisms to amalgamate a pool of analyticals by gradient descent.
In order to reduce variance of the process, we also explore methods to stabilize the process by perturbing the target.
- Score: 124.33523126363728
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Selecting an appropriate optimizer for a given problem is of major interest
for researchers and practitioners. Many analytical optimizers have been
proposed using a variety of theoretical and empirical approaches; however, none
can offer a universal advantage over other competitive optimizers. We are thus
motivated to study a new problem named Optimizer Amalgamation: how can we best
combine a pool of "teacher" optimizers into a single "student" optimizer that
can have stronger problem-specific performance? In this paper, we draw
inspiration from the field of "learning to optimize" to use a learnable
amalgamation target. First, we define three differentiable amalgamation
mechanisms to amalgamate a pool of analytical optimizers by gradient descent.
Then, in order to reduce variance of the amalgamation process, we also explore
methods to stabilize the amalgamation process by perturbing the amalgamation
target. Finally, we present experiments showing the superiority of our
amalgamated optimizer compared to its amalgamated components and learning to
optimize baselines, and the efficacy of our variance reducing perturbations.
Our code and pre-trained models are publicly available at
http://github.com/VITA-Group/OptimizerAmalgamation.
Related papers
- Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning [69.95292905263393]
We show that gradient-based optimization and large language models (MsLL) are complementary to each other, suggesting a collaborative optimization approach.
Our code is released at https://www.guozix.com/guozix/LLM-catalyst.
arXiv Detail & Related papers (2024-05-30T06:24:14Z) - Robust expected improvement for Bayesian optimization [1.8130068086063336]
We propose a surrogate modeling and active learning technique called robust expected improvement (REI) that ports adversarial methodology into the BO/GP framework.
We illustrate and draw comparisons to several competitors on benchmark synthetic exercises and real problems of varying complexity.
arXiv Detail & Related papers (2023-02-16T22:34:28Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Optimistic Optimization of Gaussian Process Samples [30.226274682578172]
A competing, computationally more efficient, global optimization framework is optimistic optimization, which exploits prior knowledge about the geometry of the search space in form of a dissimilarity function.
We argue that there is a new research domain between geometric and probabilistic search, i.e. methods that run drastically faster than traditional Bayesian optimization, while retaining some of the crucial functionality of Bayesian optimization.
arXiv Detail & Related papers (2022-09-02T09:06:24Z) - Teaching Networks to Solve Optimization Problems [13.803078209630444]
We propose to replace the iterative solvers altogether with a trainable parametric set function.
We show the feasibility of learning such parametric (set) functions to solve various classic optimization problems.
arXiv Detail & Related papers (2022-02-08T19:13:13Z) - SnAKe: Bayesian Optimization with Pathwise Exploration [9.807656882149319]
We consider a novel setting where the expense of evaluating the function can increase significantly when making large input changes between iterations.
This paper investigates the problem and introduces 'Sequential Bayesian Optimization via Adaptive Connecting Samples' (SnAKe)
It provides a solution by considering future queries and preemptively building optimization paths that minimize input costs.
arXiv Detail & Related papers (2022-01-31T19:42:56Z) - Divide and Learn: A Divide and Conquer Approach for Predict+Optimize [50.03608569227359]
The predict+optimize problem combines machine learning ofproblem coefficients with a optimization prob-lem that uses the predicted coefficients.
We show how to directlyexpress the loss of the optimization problem in terms of thepredicted coefficients as a piece-wise linear function.
We propose a novel divide and algorithm to tackle optimization problems without this restriction and predict itscoefficients using the optimization loss.
arXiv Detail & Related papers (2020-12-04T00:26:56Z) - Reverse engineering learned optimizers reveals known and novel
mechanisms [50.50540910474342]
Learneds are algorithms that can themselves be trained to solve optimization problems.
Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
arXiv Detail & Related papers (2020-11-04T07:12:43Z) - Descending through a Crowded Valley - Benchmarking Deep Learning
Optimizers [29.624308090226375]
In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed anecdotes.
To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learnings.
Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods.
arXiv Detail & Related papers (2020-07-03T08:19:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.