A Gradient-based Bilevel Optimization Approach for Tuning
Hyperparameters in Machine Learning
- URL: http://arxiv.org/abs/2007.11022v1
- Date: Tue, 21 Jul 2020 18:15:08 GMT
- Title: A Gradient-based Bilevel Optimization Approach for Tuning
Hyperparameters in Machine Learning
- Authors: Ankur Sinha, Tanmay Khandait, Raja Mohanty
- Abstract summary: We propose a bilevel solution method for solving the hyperparameter optimization problem.
The proposed method is general and can be easily applied to any class of machine learning algorithms.
We discuss the theory behind the proposed algorithm and perform extensive computational study on two datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperparameter tuning is an active area of research in machine learning,
where the aim is to identify the optimal hyperparameters that provide the best
performance on the validation set. Hyperparameter tuning is often achieved
using naive techniques, such as random search and grid search. However, most of
these methods seldom lead to an optimal set of hyperparameters and often get
very expensive. In this paper, we propose a bilevel solution method for solving
the hyperparameter optimization problem that does not suffer from the drawbacks
of the earlier studies. The proposed method is general and can be easily
applied to any class of machine learning algorithms. The idea is based on the
approximation of the lower level optimal value function mapping, which is an
important mapping in bilevel optimization and helps in reducing the bilevel
problem to a single level constrained optimization task. The single-level
constrained optimization problem is solved using the augmented Lagrangian
method. We discuss the theory behind the proposed algorithm and perform
extensive computational study on two datasets that confirm the efficiency of
the proposed method. We perform a comparative study against grid search, random
search and Bayesian optimization techniques that shows that the proposed
algorithm is multiple times faster on problems with one or two hyperparameters.
The computational gain is expected to be significantly higher as the number of
hyperparameters increase. Corresponding to a given hyperparameter most of the
techniques in the literature often assume a unique optimal parameter set that
minimizes loss on the training set. Such an assumption is often violated by
deep learning architectures and the proposed method does not require any such
assumption.
Related papers
- Analyzing Inexact Hypergradients for Bilevel Learning [0.09669369645900441]
We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation.
Our numerical results show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver.
arXiv Detail & Related papers (2023-01-11T23:54:27Z) - A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization
Method [0.0]
We propose a gradient-based bilevel method for solving the hyperparameter optimization problem.
We show that the proposed method converges with lower computation and leads to models that generalize better on the testing set.
arXiv Detail & Related papers (2022-08-25T14:25:16Z) - Hyper-parameter optimization based on soft actor critic and hierarchical
mixture regularization [5.063728016437489]
We model hyper- parameter optimization process as a Markov decision process, and tackle it with reinforcement learning.
A novel hyper- parameter optimization method based on soft actor critic and hierarchical mixture regularization has been proposed.
arXiv Detail & Related papers (2021-12-08T02:34:43Z) - Enhanced Bilevel Optimization via Bregman Distance [104.96004056928474]
We propose a bilevel optimization method based on Bregman Bregman functions.
We also propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique.
arXiv Detail & Related papers (2021-07-26T16:18:43Z) - BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Efficient hyperparameter optimization by way of PAC-Bayes bound
minimization [4.191847852775072]
We present an alternative objective that is equivalent to a Probably Approximately Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error.
We then devise an efficient gradient-based algorithm to minimize this objective.
arXiv Detail & Related papers (2020-08-14T15:54:51Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.