Exploring the Optimized Value of Each Hyperparameter in Various Gradient
Descent Algorithms
- URL: http://arxiv.org/abs/2212.12279v1
- Date: Fri, 23 Dec 2022 12:04:33 GMT
- Title: Exploring the Optimized Value of Each Hyperparameter in Various Gradient
Descent Algorithms
- Authors: Abel C. H. Chen
- Abstract summary: gradient descent algorithms have been applied to the parameter optimization of several deep learning models with higher accuracies or lower errors.
This study proposes an analytical framework for analyzing the mean error of each objective function based on various gradient descent algorithms.
The experimental results show that higher efficiency convergences and lower errors can be obtained by the proposed method.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the recent years, various gradient descent algorithms including the
methods of gradient descent, gradient descent with momentum, adaptive gradient
(AdaGrad), root-mean-square propagation (RMSProp) and adaptive moment
estimation (Adam) have been applied to the parameter optimization of several
deep learning models with higher accuracies or lower errors. These optimization
algorithms may need to set the values of several hyperparameters which include
a learning rate, momentum coefficients, etc. Furthermore, the convergence speed
and solution accuracy may be influenced by the values of hyperparameters.
Therefore, this study proposes an analytical framework to use mathematical
models for analyzing the mean error of each objective function based on various
gradient descent algorithms. Moreover, the suitable value of each
hyperparameter could be determined by minimizing the mean error. The principles
of hyperparameter value setting have been generalized based on analysis results
for model optimization. The experimental results show that higher efficiency
convergences and lower errors can be obtained by the proposed method.
Related papers
- Cross-Entropy Optimization for Hyperparameter Optimization in Stochastic Gradient-based Approaches to Train Deep Neural Networks [2.1046873879077794]
We present a cross-entropy optimization method for hyperparameter optimization of a learning algorithm.
The presented method can be applied to other areas of optimization problems in deep learning.
arXiv Detail & Related papers (2024-09-14T00:39:37Z) - Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work.
Our empirical investigation includes tens of thousands of models trained with all combinations of threes.
We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z) - Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations [34.500484733973536]
Ordinary differential equations (ODEs) are widely used to describe dynamical systems in science, but identifying parameters that explain experimental measurements is challenging.
We propose diffusion tempering, a novel regularization technique for probabilistic numerical methods which improves convergence of gradient-based parameter optimization in ODEs.
We demonstrate that our method is effective for dynamical systems of different complexity and show that it obtains reliable parameter estimates for a Hodgkin-Huxley model with a practically relevant number of parameters.
arXiv Detail & Related papers (2024-02-19T15:36:36Z) - A Multi-objective Newton Optimization Algorithm for Hyper-Parameter
Search [0.0]
The algorithm is applied to search the optimal probability threshold (a vector of eight parameters) for a multiclass object detection problem of a convolutional neural network.
The algorithm produces an overall higher true positive (TP) and lower false positive (FP) rates, as compared to using the default value of 0.5.
arXiv Detail & Related papers (2024-01-07T21:12:34Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - How to Prove the Optimized Values of Hyperparameters for Particle Swarm
Optimization? [0.0]
This study proposes an analytic framework to analyze the optimized average-fitness-function-value (AFFV) based on mathematical models for a variety of fitness functions.
Experimental results show that the hyper parameter values from the proposed method can obtain higher efficiency convergences and lower AFFVs.
arXiv Detail & Related papers (2023-02-01T00:33:35Z) - Multi-objective hyperparameter optimization with performance uncertainty [62.997667081978825]
This paper presents results on multi-objective hyperparameter optimization with uncertainty on the evaluation of Machine Learning algorithms.
We combine the sampling strategy of Tree-structured Parzen Estimators (TPE) with the metamodel obtained after training a Gaussian Process Regression (GPR) with heterogeneous noise.
Experimental results on three analytical test functions and three ML problems show the improvement over multi-objective TPE and GPR.
arXiv Detail & Related papers (2022-09-09T14:58:43Z) - A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization
Method [0.0]
We propose a gradient-based bilevel method for solving the hyperparameter optimization problem.
We show that the proposed method converges with lower computation and leads to models that generalize better on the testing set.
arXiv Detail & Related papers (2022-08-25T14:25:16Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.