Related papers: Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks

Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks

URL: http://arxiv.org/abs/2303.03540v1
Date: Mon, 6 Mar 2023 22:49:20 GMT
Title: Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks
Authors: Dmitry Pasechnyuk, Anton Prazdnichnykh, Mikhail Evtikhiev, Timofey Bryksin
Abstract summary: We test the performance of variouss on deep learning models for source code. We find that the choice of anahead can have a significant impact on the model quality. We suggest that the ML4SE community should consider using RAdam instead Adam as the default for code-related deep learning tasks.
Score: 2.8961929092154697
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Solving a problem with a deep learning model requires researchers to optimize the loss function with a certain optimization method. The research community has developed more than a hundred different optimizers, yet there is scarce data on optimizer performance in various tasks. In particular, none of the benchmarks test the performance of optimizers on source code-related problems. However, existing benchmark data indicates that certain optimizers may be more efficient for particular domains. In this work, we test the performance of various optimizers on deep learning models for source code and find that the choice of an optimizer can have a significant impact on the model quality, with up to two-fold score differences between some of the relatively well-performing optimizers. We also find that RAdam optimizer (and its modification with the Lookahead envelope) is the best optimizer that almost always performs well on the tasks we consider. Our findings show a need for a more extensive study of the optimizers in code-related tasks, and indicate that the ML4SE community should consider using RAdam instead of Adam as the default optimizer for code-related deep learning tasks.

Related papers

metaTextGrad: Automatically optimizing language model optimizers [28.39185344194562]
Large language models (LLMs) are increasingly used in learning algorithms, evaluations, and optimization tasks.<n>Recent studies have shown that using LLM-baseds to automatically optimize model prompts, demonstrations, predictions themselves, or other components can significantly enhance the performance of AI systems.<n>Our approach consists of two key components: a meta prompt and a meta structure. The combination of these two significantly improves performance across multiple benchmarks, achieving an average absolute performance improvement of up to 6% compared to the best baseline.
arXiv Detail & Related papers (2025-05-24T05:40:38Z)
Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization [81.88668100203913]
Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks. In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time.
arXiv Detail & Related papers (2024-06-17T16:10:10Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
We introduce Meta-Adaptives (MADA), a unified framework that can generalize several known convergences and dynamically learn the most suitable one during training. We empirically compare MADA to other populars on vision and language tasks, and find that MADA consistently outperforms Adam and other populars. We also propose AVGrad, a modification of AMS that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization.
arXiv Detail & Related papers (2024-01-17T00:16:46Z)
Large Language Models as Optimizers [106.52386531624532]
We propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as prompts. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values. We demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.
arXiv Detail & Related papers (2023-09-07T00:07:15Z)
Improving Performance Insensitivity of Large-scale Multiobjective Optimization via Monte Carlo Tree Search [7.34812867861951]
We propose an evolutionary algorithm for solving large-scale multiobjective optimization problems based on Monte Carlo tree search. The proposed method samples the decision variables to construct new nodes on the Monte Carlo tree for optimization and evaluation. It selects nodes with good evaluation for further search to reduce the performance sensitivity caused by large-scale decision variables.
arXiv Detail & Related papers (2023-04-08T17:15:49Z)
VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles. We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates. We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z)
Practical tradeoffs between memory, compute, and performance in learned optimizers [46.04132441790654]
We identify and quantify the memory, compute, and performance trade-offs for many learned and hand-designeds features. We leverage our analysis to construct a learned is both faster and more efficient than previous work.
arXiv Detail & Related papers (2022-03-22T16:36:36Z)
Optimizer Amalgamation [124.33523126363728]
We are motivated to study a new problem named Amalgamation: how can we best combine a pool of "teacher" amalgamations into a single "student" that can have stronger problem-specific performance? First, we define three differentiable mechanisms to amalgamate a pool of analyticals by gradient descent. In order to reduce variance of the process, we also explore methods to stabilize the process by perturbing the target.
arXiv Detail & Related papers (2022-03-12T16:07:57Z)
Learning to Optimize: A Primer and A Benchmark [94.29436694770953]
Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods. This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization.
arXiv Detail & Related papers (2021-03-23T20:46:20Z)
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers [29.624308090226375]
In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed anecdotes. To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learnings. Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods.
arXiv Detail & Related papers (2020-07-03T08:19:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.