Judging Adam: Studying the Performance of Optimization Methods on ML4SE
Tasks
- URL: http://arxiv.org/abs/2303.03540v1
- Date: Mon, 6 Mar 2023 22:49:20 GMT
- Title: Judging Adam: Studying the Performance of Optimization Methods on ML4SE
Tasks
- Authors: Dmitry Pasechnyuk, Anton Prazdnichnykh, Mikhail Evtikhiev, Timofey
Bryksin
- Abstract summary: We test the performance of variouss on deep learning models for source code.
We find that the choice of anahead can have a significant impact on the model quality.
We suggest that the ML4SE community should consider using RAdam instead Adam as the default for code-related deep learning tasks.
- Score: 2.8961929092154697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Solving a problem with a deep learning model requires researchers to optimize
the loss function with a certain optimization method. The research community
has developed more than a hundred different optimizers, yet there is scarce
data on optimizer performance in various tasks. In particular, none of the
benchmarks test the performance of optimizers on source code-related problems.
However, existing benchmark data indicates that certain optimizers may be more
efficient for particular domains. In this work, we test the performance of
various optimizers on deep learning models for source code and find that the
choice of an optimizer can have a significant impact on the model quality, with
up to two-fold score differences between some of the relatively well-performing
optimizers. We also find that RAdam optimizer (and its modification with the
Lookahead envelope) is the best optimizer that almost always performs well on
the tasks we consider. Our findings show a need for a more extensive study of
the optimizers in code-related tasks, and indicate that the ML4SE community
should consider using RAdam instead of Adam as the default optimizer for
code-related deep learning tasks.
Related papers
- Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization [81.88668100203913]
Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks.
In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time.
arXiv Detail & Related papers (2024-06-17T16:10:10Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
We introduce Meta-Adaptives (MADA), a unified framework that can generalize several known convergences and dynamically learn the most suitable one during training.
We empirically compare MADA to other populars on vision and language tasks, and find that MADA consistently outperforms Adam and other populars.
We also propose AVGrad, a modification of AMS that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization.
arXiv Detail & Related papers (2024-01-17T00:16:46Z) - Large Language Models as Optimizers [106.52386531624532]
We propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as prompts.
In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values.
We demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.
arXiv Detail & Related papers (2023-09-07T00:07:15Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - Practical tradeoffs between memory, compute, and performance in learned
optimizers [46.04132441790654]
We identify and quantify the memory, compute, and performance trade-offs for many learned and hand-designeds features.
We leverage our analysis to construct a learned is both faster and more efficient than previous work.
arXiv Detail & Related papers (2022-03-22T16:36:36Z) - Learning to Optimize: A Primer and A Benchmark [94.29436694770953]
Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods.
This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization.
arXiv Detail & Related papers (2021-03-23T20:46:20Z) - Descending through a Crowded Valley - Benchmarking Deep Learning
Optimizers [29.624308090226375]
In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed anecdotes.
To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learnings.
Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods.
arXiv Detail & Related papers (2020-07-03T08:19:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.