A Unified Gaussian Process for Branching and Nested Hyperparameter
Optimization
- URL: http://arxiv.org/abs/2402.04885v1
- Date: Fri, 19 Jan 2024 21:11:32 GMT
- Title: A Unified Gaussian Process for Branching and Nested Hyperparameter
Optimization
- Authors: Jiazhao Zhang and Ying Hung and Chung-Ching Lin and Zicheng Liu
- Abstract summary: In deep learning, tuning parameters with conditional dependence are common in practice.
New GP model accounts for the dependent structure among input variables through a new kernel function.
High prediction accuracy and better optimization efficiency are observed in a series of synthetic simulations and real data applications of neural networks.
- Score: 19.351804144005744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Choosing appropriate hyperparameters plays a crucial role in the success of
neural networks as hyper-parameters directly control the behavior and
performance of the training algorithms. To obtain efficient tuning, Bayesian
optimization methods based on Gaussian process (GP) models are widely used.
Despite numerous applications of Bayesian optimization in deep learning, the
existing methodologies are developed based on a convenient but restrictive
assumption that the tuning parameters are independent of each other. However,
tuning parameters with conditional dependence are common in practice. In this
paper, we focus on two types of them: branching and nested parameters. Nested
parameters refer to those tuning parameters that exist only within a particular
setting of another tuning parameter, and a parameter within which other
parameters are nested is called a branching parameter. To capture the
conditional dependence between branching and nested parameters, a unified
Bayesian optimization framework is proposed. The sufficient conditions are
rigorously derived to guarantee the validity of the kernel function, and the
asymptotic convergence of the proposed optimization framework is proven under
the continuum-armed-bandit setting. Based on the new GP model, which accounts
for the dependent structure among input variables through a new kernel
function, higher prediction accuracy and better optimization efficiency are
observed in a series of synthetic simulations and real data applications of
neural networks. Sensitivity analysis is also performed to provide insights
into how changes in hyperparameter values affect prediction accuracy.
Related papers
- Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work.
Our empirical investigation includes tens of thousands of models trained with all combinations of threes.
We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z) - Parameter Optimization with Conscious Allocation (POCA) [4.478575931884855]
Hyperband-based approaches to machine learning are among the most effective.
We present.
the new.
Optimization with Conscious Allocation (POCA), a hyperband-based algorithm that adaptively allocates the inputted.
budget to the hyperparameter configurations it generates.
POCA finds strong configurations faster in both settings.
arXiv Detail & Related papers (2023-12-29T00:13:55Z) - Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual.
sensuous-aware fine-Tuning (SPT) scheme.
SPT allocates trainable parameters to task-specific important positions.
Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z) - On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks.
We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them.
Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z) - Surrogate modeling for Bayesian optimization beyond a single Gaussian
process [62.294228304646516]
We propose a novel Bayesian surrogate model to balance exploration with exploitation of the search space.
To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model.
To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret.
arXiv Detail & Related papers (2022-05-27T16:43:10Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z) - Efficient Hyperparameter Tuning with Dynamic Accuracy Derivative-Free
Optimization [0.27074235008521236]
We apply a recent dynamic accuracy derivative-free optimization method to hyperparameter tuning.
This method allows inexact evaluations of the learning problem while retaining convergence guarantees.
We demonstrate its robustness and efficiency compared to a fixed accuracy approach.
arXiv Detail & Related papers (2020-11-06T00:59:51Z) - Automatic Setting of DNN Hyper-Parameters by Mixing Bayesian
Optimization and Tuning Rules [0.6875312133832078]
We build a new algorithm for evaluating and analyzing the results of the network on the training and validation sets.
We use a set of tuning rules to add new hyper-parameters and/or to reduce the hyper- parameter search space to select a better combination.
arXiv Detail & Related papers (2020-06-03T08:53:48Z) - Rethinking the Hyperparameters for Fine-tuning [78.15505286781293]
Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks.
Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyper parameters.
This paper re-examines several common practices of setting hyper parameters for fine-tuning.
arXiv Detail & Related papers (2020-02-19T18:59:52Z) - Online Parameter Estimation for Safety-Critical Systems with Gaussian
Processes [6.122161391301866]
We present a Bayesian optimization framework based on Gaussian processes (GPs) for online parameter estimation.
It uses an efficient search strategy over a response surface in the parameter space for finding the global optima with minimal function evaluations.
We demonstrate our technique on an actuated planar pendulum and safety-critical quadrotor in simulation with changing parameters.
arXiv Detail & Related papers (2020-02-18T20:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.