Do optimization methods in deep learning applications matter?
- URL: http://arxiv.org/abs/2002.12642v1
- Date: Fri, 28 Feb 2020 10:36:40 GMT
- Title: Do optimization methods in deep learning applications matter?
- Authors: Buse Melis Ozyildirim (1), Mariam Kiran (2) ((1) Department of
Computer Engineering Cukurova University, (2) Energy Sciences Network
Lawrence Berkeley National Laboratory)
- Abstract summary: The paper presents arguments on which optimization functions to use and further, which functions would benefit from parallelization efforts.
Our experiments compare off-the-shelf optimization functions(CG, SGD, LM and L-BFGS) in standard CIFAR, MNIST, CartPole and FlappyBird experiments.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With advances in deep learning, exponential data growth and increasing model
complexity, developing efficient optimization methods are attracting much
research attention. Several implementations favor the use of Conjugate Gradient
(CG) and Stochastic Gradient Descent (SGD) as being practical and elegant
solutions to achieve quick convergence, however, these optimization processes
also present many limitations in learning across deep learning applications.
Recent research is exploring higher-order optimization functions as better
approaches, but these present very complex computational challenges for
practical use. Comparing first and higher-order optimization functions, in this
paper, our experiments reveal that Levemberg-Marquardt (LM) significantly
supersedes optimal convergence but suffers from very large processing time
increasing the training complexity of both, classification and reinforcement
learning problems. Our experiments compare off-the-shelf optimization
functions(CG, SGD, LM and L-BFGS) in standard CIFAR, MNIST, CartPole and
FlappyBird experiments.The paper presents arguments on which optimization
functions to use and further, which functions would benefit from
parallelization efforts to improve pretraining time and learning rate
convergence.
Related papers
- Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA)
Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z) - Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning [69.95292905263393]
We show that gradient-based optimization and large language models (MsLL) are complementary to each other, suggesting a collaborative optimization approach.
Our code is released at https://www.guozix.com/guozix/LLM-catalyst.
arXiv Detail & Related papers (2024-05-30T06:24:14Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - A Data-Driven Evolutionary Transfer Optimization for Expensive Problems
in Dynamic Environments [9.098403098464704]
Data-driven, a.k.a. surrogate-assisted, evolutionary optimization has been recognized as an effective approach for tackling expensive black-box optimization problems.
This paper proposes a simple but effective transfer learning framework to empower data-driven evolutionary optimization to solve dynamic optimization problems.
Experiments on synthetic benchmark test problems and a real-world case study demonstrate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2022-11-05T11:19:50Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Learning for Robust Combinatorial Optimization: Algorithm and
Application [26.990988571097827]
Learning to optimize (L2O) has emerged as a promising approach to solving optimization problems by exploiting the strong prediction power of neural networks.
In this paper, we propose a novel learning-based optimization, called LRCO, which quickly outputs a robust solution in the presence of uncertain context.
Our results highlight that LRCO can greatly reduce the worst-case cost and runtime, while having a very low complexity.
arXiv Detail & Related papers (2021-12-20T07:58:50Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - A Primer on Zeroth-Order Optimization in Signal Processing and Machine
Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update.
We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z) - Learning to be Global Optimizer [28.88646928299302]
We learn an optimal network and escaping capability algorithm for some benchmark functions.
We show that the learned algorithm significantly outperforms some well-known classical optimization algorithms.
arXiv Detail & Related papers (2020-03-10T03:46:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.