Descending through a Crowded Valley - Benchmarking Deep Learning
Optimizers
- URL: http://arxiv.org/abs/2007.01547v6
- Date: Tue, 10 Aug 2021 23:17:21 GMT
- Title: Descending through a Crowded Valley - Benchmarking Deep Learning
Optimizers
- Authors: Robin M. Schmidt, Frank Schneider, Philipp Hennig
- Abstract summary: In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed anecdotes.
To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learnings.
Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods.
- Score: 29.624308090226375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Choosing the optimizer is considered to be among the most crucial design
decisions in deep learning, and it is not an easy one. The growing literature
now lists hundreds of optimization methods. In the absence of clear theoretical
guidance and conclusive empirical evidence, the decision is often made based on
anecdotes. In this work, we aim to replace these anecdotes, if not with a
conclusive ranking, then at least with evidence-backed heuristics. To do so, we
perform an extensive, standardized benchmark of fifteen particularly popular
deep learning optimizers while giving a concise overview of the wide range of
possible choices. Analyzing more than $50,000$ individual runs, we contribute
the following three points: (i) Optimizer performance varies greatly across
tasks. (ii) We observe that evaluating multiple optimizers with default
parameters works approximately as well as tuning the hyperparameters of a
single, fixed optimizer. (iii) While we cannot discern an optimization method
clearly dominating across all tested tasks, we identify a significantly reduced
subset of specific optimizers and parameter choices that generally lead to
competitive results in our experiments: Adam remains a strong contender, with
newer methods failing to significantly and consistently outperform it. Our
open-sourced results are available as challenging and well-tuned baselines for
more meaningful evaluations of novel optimization methods without requiring any
further computational efforts.
Related papers
- Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - A Survey on Multi-Objective based Parameter Optimization for Deep
Learning [1.3223682837381137]
We focus on exploring the effectiveness of multi-objective optimization strategies for parameter optimization in conjunction with deep neural networks.
The two methods are combined to provide valuable insights into the generation of predictions and analysis in multiple applications.
arXiv Detail & Related papers (2023-05-17T07:48:54Z) - Judging Adam: Studying the Performance of Optimization Methods on ML4SE
Tasks [2.8961929092154697]
We test the performance of variouss on deep learning models for source code.
We find that the choice of anahead can have a significant impact on the model quality.
We suggest that the ML4SE community should consider using RAdam instead Adam as the default for code-related deep learning tasks.
arXiv Detail & Related papers (2023-03-06T22:49:20Z) - Characterization of Constrained Continuous Multiobjective Optimization
Problems: A Performance Space Perspective [0.0]
Constrained multiobjective optimization problems (CMOPs) are unsatisfactorily understood.
The choice of adequate CMOPs for benchmarking is difficult and lacks a formal background.
This paper presents a novel performance assessment approach designed explicitly for constrained multiobjective optimization.
arXiv Detail & Related papers (2023-02-04T14:12:30Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - Optimizer Amalgamation [124.33523126363728]
We are motivated to study a new problem named Amalgamation: how can we best combine a pool of "teacher" amalgamations into a single "student" that can have stronger problem-specific performance?
First, we define three differentiable mechanisms to amalgamate a pool of analyticals by gradient descent.
In order to reduce variance of the process, we also explore methods to stabilize the process by perturbing the target.
arXiv Detail & Related papers (2022-03-12T16:07:57Z) - Learning to Optimize: A Primer and A Benchmark [94.29436694770953]
Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods.
This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization.
arXiv Detail & Related papers (2021-03-23T20:46:20Z) - Fast Rates for Contextual Linear Optimization [52.39202699484225]
We show that a naive plug-in approach achieves regret convergence rates that are significantly faster than methods that directly optimize downstream decision performance.
Our results are overall positive for practice: predictive models are easy and fast to train using existing tools, simple to interpret, and, as we show, lead to decisions that perform very well.
arXiv Detail & Related papers (2020-11-05T18:43:59Z) - Reverse engineering learned optimizers reveals known and novel
mechanisms [50.50540910474342]
Learneds are algorithms that can themselves be trained to solve optimization problems.
Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
arXiv Detail & Related papers (2020-11-04T07:12:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.