Related papers: Exploring the Optimized Value of Each Hyperparameter in Various Gradient Descent Algorithms

Exploring the Optimized Value of Each Hyperparameter in Various Gradient Descent Algorithms

URL: http://arxiv.org/abs/2212.12279v1
Date: Fri, 23 Dec 2022 12:04:33 GMT
Title: Exploring the Optimized Value of Each Hyperparameter in Various Gradient Descent Algorithms
Authors: Abel C. H. Chen
Abstract summary: gradient descent algorithms have been applied to the parameter optimization of several deep learning models with higher accuracies or lower errors. This study proposes an analytical framework for analyzing the mean error of each objective function based on various gradient descent algorithms. The experimental results show that higher efficiency convergences and lower errors can be obtained by the proposed method.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the recent years, various gradient descent algorithms including the methods of gradient descent, gradient descent with momentum, adaptive gradient (AdaGrad), root-mean-square propagation (RMSProp) and adaptive moment estimation (Adam) have been applied to the parameter optimization of several deep learning models with higher accuracies or lower errors. These optimization algorithms may need to set the values of several hyperparameters which include a learning rate, momentum coefficients, etc. Furthermore, the convergence speed and solution accuracy may be influenced by the values of hyperparameters. Therefore, this study proposes an analytical framework to use mathematical models for analyzing the mean error of each objective function based on various gradient descent algorithms. Moreover, the suitable value of each hyperparameter could be determined by minimizing the mean error. The principles of hyperparameter value setting have been generalized based on analysis results for model optimization. The experimental results show that higher efficiency convergences and lower errors can be obtained by the proposed method.

Related papers

Adaptive sparse variational approximations for Gaussian process regression [6.169364905804677]
We construct a variational approximation to a hierarchical Bayes procedure, and derive upper bounds for the contraction rate of the variational posterior. Our theoretical results are accompanied by numerical analysis both on synthetic and real world data sets.
arXiv Detail & Related papers (2025-04-04T09:57:00Z)
A New Stochastic Approximation Method for Gradient-based Simulated Parameter Estimation [0.7673339435080445]
We introduce a gradient-based simulated parameter estimation framework, which employs a multi-time scale approximation algorithm. This approach effectively addresses the ratio bias that arises in both maximum likelihood estimation and posterior density estimation problems. Our work extends the GSPE framework to handle complex models such as Markov models and variational inference-based problems.
arXiv Detail & Related papers (2025-03-24T03:54:50Z)
Eliminating Ratio Bias for Gradient-based Simulated Parameter Estimation [0.7673339435080445]
This article addresses the challenge of parameter calibration in models where the likelihood function is not analytically available. We propose a gradient-based simulated parameter estimation framework, leveraging a multi-time scale that tackles the issue of ratio bias in both maximum likelihood estimation and posterior density estimation problems.
arXiv Detail & Related papers (2024-11-20T02:46:15Z)
Cross-Entropy Optimization for Hyperparameter Optimization in Stochastic Gradient-based Approaches to Train Deep Neural Networks [2.1046873879077794]
We present a cross-entropy optimization method for hyperparameter optimization of a learning algorithm. The presented method can be applied to other areas of optimization problems in deep learning.
arXiv Detail & Related papers (2024-09-14T00:39:37Z)
Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work. Our empirical investigation includes tens of thousands of models trained with all combinations of threes. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z)
A Multi-objective Newton Optimization Algorithm for Hyper-Parameter Search [0.0]
The algorithm is applied to search the optimal probability threshold (a vector of eight parameters) for a multiclass object detection problem of a convolutional neural network. The algorithm produces an overall higher true positive (TP) and lower false positive (FP) rates, as compared to using the default value of 0.5.
arXiv Detail & Related papers (2024-01-07T21:12:34Z)
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes. We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z)
Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood. These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z)
How to Prove the Optimized Values of Hyperparameters for Particle Swarm Optimization? [0.0]
This study proposes an analytic framework to analyze the optimized average-fitness-function-value (AFFV) based on mathematical models for a variety of fitness functions. Experimental results show that the hyper parameter values from the proposed method can obtain higher efficiency convergences and lower AFFVs.
arXiv Detail & Related papers (2023-02-01T00:33:35Z)
Multi-objective hyperparameter optimization with performance uncertainty [62.997667081978825]
This paper presents results on multi-objective hyperparameter optimization with uncertainty on the evaluation of Machine Learning algorithms. We combine the sampling strategy of Tree-structured Parzen Estimators (TPE) with the metamodel obtained after training a Gaussian Process Regression (GPR) with heterogeneous noise. Experimental results on three analytical test functions and three ML problems show the improvement over multi-objective TPE and GPR.
arXiv Detail & Related papers (2022-09-09T14:58:43Z)
A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization Method [0.0]
We propose a gradient-based bilevel method for solving the hyperparameter optimization problem. We show that the proposed method converges with lower computation and leads to models that generalize better on the testing set.
arXiv Detail & Related papers (2022-08-25T14:25:16Z)
Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings. We exploit a warm-start strategy to amortize the estimation of the exact gradient. By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.