A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization
Method
- URL: http://arxiv.org/abs/2208.12118v2
- Date: Sun, 18 Jun 2023 12:40:24 GMT
- Title: A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization
Method
- Authors: Ankur Sinha, Satender Gunwal and Shivam Kumar
- Abstract summary: We propose a gradient-based bilevel method for solving the hyperparameter optimization problem.
We show that the proposed method converges with lower computation and leads to models that generalize better on the testing set.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperparameter optimization in machine learning is often achieved using naive
techniques that only lead to an approximate set of hyperparameters. Although
techniques such as Bayesian optimization perform an intelligent search on a
given domain of hyperparameters, it does not guarantee an optimal solution. A
major drawback of most of these approaches is an exponential increase of their
search domain with number of hyperparameters, increasing the computational cost
and making the approaches slow. The hyperparameter optimization problem is
inherently a bilevel optimization task, and some studies have attempted bilevel
solution methodologies for solving this problem. However, these studies assume
a unique set of model weights that minimize the training loss, which is
generally violated by deep learning architectures. This paper discusses a
gradient-based bilevel method addressing these drawbacks for solving the
hyperparameter optimization problem. The proposed method can handle continuous
hyperparameters for which we have chosen the regularization hyperparameter in
our experiments. The method guarantees convergence to the set of optimal
hyperparameters that this study has theoretically proven. The idea is based on
approximating the lower-level optimal value function using Gaussian process
regression. As a result, the bilevel problem is reduced to a single level
constrained optimization task that is solved using the augmented Lagrangian
method. We have performed an extensive computational study on the MNIST and
CIFAR-10 datasets on multi-layer perceptron and LeNet architectures that
confirms the efficiency of the proposed method. A comparative study against
grid search, random search, Bayesian optimization, and HyberBand method on
various hyperparameter problems shows that the proposed algorithm converges
with lower computation and leads to models that generalize better on the
testing set.
Related papers
- An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Analyzing Inexact Hypergradients for Bilevel Learning [0.09669369645900441]
We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation.
Our numerical results show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver.
arXiv Detail & Related papers (2023-01-11T23:54:27Z) - A Comparative study of Hyper-Parameter Optimization Tools [2.6097538974670935]
We compare the performance of four python libraries, namely Optuna, Hyperopt, Optunity, and sequential model algorithm configuration (SMAC)
We found that Optuna has better performance for CASH problem and NeurIPS black-box optimization challenge.
arXiv Detail & Related papers (2022-01-17T14:49:36Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Efficient hyperparameter optimization by way of PAC-Bayes bound
minimization [4.191847852775072]
We present an alternative objective that is equivalent to a Probably Approximately Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error.
We then devise an efficient gradient-based algorithm to minimize this objective.
arXiv Detail & Related papers (2020-08-14T15:54:51Z) - A Gradient-based Bilevel Optimization Approach for Tuning
Hyperparameters in Machine Learning [0.0]
We propose a bilevel solution method for solving the hyperparameter optimization problem.
The proposed method is general and can be easily applied to any class of machine learning algorithms.
We discuss the theory behind the proposed algorithm and perform extensive computational study on two datasets.
arXiv Detail & Related papers (2020-07-21T18:15:08Z) - Cross Entropy Hyperparameter Optimization for Constrained Problem
Hamiltonians Applied to QAOA [68.11912614360878]
Hybrid quantum-classical algorithms such as Quantum Approximate Optimization Algorithm (QAOA) are considered as one of the most encouraging approaches for taking advantage of near-term quantum computers in practical applications.
Such algorithms are usually implemented in a variational form, combining a classical optimization method with a quantum machine to find good solutions to an optimization problem.
In this study we apply a Cross-Entropy method to shape this landscape, which allows the classical parameter to find better parameters more easily and hence results in an improved performance.
arXiv Detail & Related papers (2020-03-11T13:52:41Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.