Optimal Hyperparameter $\epsilon$ for Adaptive Stochastic Optimizers
through Gradient Histograms
- URL: http://arxiv.org/abs/2311.11532v1
- Date: Mon, 20 Nov 2023 04:34:19 GMT
- Title: Optimal Hyperparameter $\epsilon$ for Adaptive Stochastic Optimizers
through Gradient Histograms
- Authors: Gustavo Silva, Paul Rodriguez
- Abstract summary: We introduce a new framework based on gradient histograms to analyze and justify attributes adaptives.
We propose a novel gradient histogram-based algorithm that automatically estimates a reduced and accurate search space for the safeguard factor $epsilon$.
- Score: 0.8702432681310399
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Optimizers are essential components for successfully training deep neural
network models. In order to achieve the best performance from such models,
designers need to carefully choose the optimizer hyperparameters. However, this
can be a computationally expensive and time-consuming process. Although it is
known that all optimizer hyperparameters must be tuned for maximum performance,
there is still a lack of clarity regarding the individual influence of minor
priority hyperparameters, including the safeguard factor $\epsilon$ and
momentum factor $\beta$, in leading adaptive optimizers (specifically, those
based on the Adam optimizers). In this manuscript, we introduce a new framework
based on gradient histograms to analyze and justify important attributes of
adaptive optimizers, such as their optimal performance and the relationships
and dependencies among hyperparameters. Furthermore, we propose a novel
gradient histogram-based algorithm that automatically estimates a reduced and
accurate search space for the safeguard hyperparameter $\epsilon$, where the
optimal value can be easily found.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Comparative Evaluation of Metaheuristic Algorithms for Hyperparameter
Selection in Short-Term Weather Forecasting [0.0]
This paper explores the application of metaheuristic algorithms, namely Genetic Algorithm (GA), Differential Evolution (DE) and Particle Swarm Optimization (PSO)
We evaluate their performance in weather forecasting based on metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE)
arXiv Detail & Related papers (2023-09-05T22:13:35Z) - Hyper-parameter optimization based on soft actor critic and hierarchical
mixture regularization [5.063728016437489]
We model hyper- parameter optimization process as a Markov decision process, and tackle it with reinforcement learning.
A novel hyper- parameter optimization method based on soft actor critic and hierarchical mixture regularization has been proposed.
arXiv Detail & Related papers (2021-12-08T02:34:43Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - On Stochastic Moving-Average Estimators for Non-Convex Optimization [105.22760323075008]
In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Efficient hyperparameter optimization by way of PAC-Bayes bound
minimization [4.191847852775072]
We present an alternative objective that is equivalent to a Probably Approximately Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error.
We then devise an efficient gradient-based algorithm to minimize this objective.
arXiv Detail & Related papers (2020-08-14T15:54:51Z) - Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z) - Towards Automatic Bayesian Optimization: A first step involving
acquisition functions [0.0]
Bayesian optimization is the state of the art technique for the optimization of black boxes, i.e., functions where we do not have access to their analytical expression.
We propose a first attempt over automatic bayesian optimization by exploring several techniques that automatically tune the acquisition function.
arXiv Detail & Related papers (2020-03-21T12:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.