Related papers: Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules

Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules

URL: http://arxiv.org/abs/2509.18396v1
Date: Mon, 22 Sep 2025 20:29:54 GMT
Title: Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules
Authors: Doğay Altınel,
Abstract summary: This study aims to provide a review of various gradients that have been proposed and received attention in the literature.<n>Momentum, AdamW, Sophia, and Muon are examined individually, and their distinctive features are highlighted.<n>Insights are offered into the open challenges encountered in the optimization of deep learning models.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deep learning optimizers are optimization algorithms that enable deep neural networks to learn. The effectiveness of learning is highly dependent on the optimizer employed in the training process. Alongside the rapid advancement of deep learning, a wide range of optimizers with different approaches have been developed. This study aims to provide a review of various optimizers that have been proposed and received attention in the literature. From Stochastic gradient descent to the most recent ones such as Momentum, AdamW, Sophia, and Muon in chronological order, optimizers are examined individually, and their distinctive features are highlighted in the study. The update rule of each optimizer is presented in detail, with an explanation of the associated concepts and variables. The techniques applied by these optimizers, their contributions to the optimization process, and their default hyperparameter settings are also discussed. In addition, insights are offered into the open challenges encountered in the optimization of deep learning models. Thus, a comprehensive resource is provided both for understanding the current state of optimizers and for identifying potential areas of future development.

Related papers

Deep Unfolding: Recent Developments, Theory, and Design Guidelines [99.63555420898554]
This article provides a tutorial-style overview of deep unfolding, a framework that transforms optimization algorithms into structured, trainable ML architectures.<n>We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature.
arXiv Detail & Related papers (2025-12-03T13:16:35Z)
Learning Joint Models of Prediction and Optimization [56.04498536842065]
Predict-Then-Then framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving. This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by joint predictive models.
arXiv Detail & Related papers (2024-09-07T19:52:14Z)
Narrowing the Focus: Learned Optimizers for Pretrained Models [24.685918556547055]
We propose a novel technique that learns a layer-specific linear combination of update directions provided by a set of base work tasks. When evaluated on an image, this specialized significantly outperforms both traditional off-the-shelf methods such as Adam, as well existing general learneds.
arXiv Detail & Related papers (2024-08-17T23:55:19Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.<n>We identify two pivotal factors in model parameter learning: update direction and update method.<n>We develop a capable Gradient-inspired Prompt-based GPO.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives. It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z)
Investigation into the Training Dynamics of Learned Optimizers [0.0]
We look at the concept of learneds as a way to accelerate the optimization process by replacing traditional, hand-crafted algorithms with meta-learned functions. Our work examines their optimization from the perspective of network architecture symmetries and update parameters. We identify several key insights that demonstrate how each approach can benefit from the strengths of the other.
arXiv Detail & Related papers (2023-12-12T11:18:43Z)
Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and Optimization [59.386153202037086]
Predict-Then- framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving. This approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step. This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models.
arXiv Detail & Related papers (2023-11-22T01:32:06Z)
A Survey of Contextual Optimization Methods for Decision Making under Uncertainty [47.73071218563257]
This review article identifies three main frameworks for learning policies from data and discusses their strengths and limitations. We present the existing models and methods under a uniform notation and terminology and classify them according to the three main frameworks.
arXiv Detail & Related papers (2023-06-17T15:21:02Z)
Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant [18.592094066642364]
This article provides a comprehensive understanding of optimization in deep learning. We focus on the challenges of gradient vanishing and gradient exploding, which normally lead to diminished model representational ability and training instability, respectively. To help understand the current optimization methodologies, we categorize them into two classes: explicit optimization and implicit optimization.
arXiv Detail & Related papers (2023-06-15T17:59:27Z)
Optimization Methods in Deep Learning: A Comprehensive Overview [0.0]
Deep learning has achieved remarkable success in various fields such as image recognition, natural language processing, and speech recognition. The effectiveness of deep learning largely depends on the optimization methods used to train deep neural networks. We provide an overview of first-order optimization methods such as Gradient Descent, Adagrad, Adadelta, and RMSprop, as well as recent momentum-based and adaptive gradient methods such as Nesterov accelerated gradient, Adam, Nadam, AdaMax, and AMSGrad.
arXiv Detail & Related papers (2023-02-19T13:01:53Z)
Learning to Optimize with Dynamic Mode Decomposition [0.0]
We show how to utilize the dynamic mode decomposition method for extracting informative features about optimization dynamics. We show that our learned generalizes much better to unseen optimization problems in short.
arXiv Detail & Related papers (2022-11-29T14:55:59Z)
VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles. We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates. We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z)
Reverse engineering learned optimizers reveals known and novel mechanisms [50.50540910474342]
Learneds are algorithms that can themselves be trained to solve optimization problems. Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
arXiv Detail & Related papers (2020-11-04T07:12:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.