Efficient hyperparameter optimization by way of PAC-Bayes bound
minimization
- URL: http://arxiv.org/abs/2008.06431v1
- Date: Fri, 14 Aug 2020 15:54:51 GMT
- Title: Efficient hyperparameter optimization by way of PAC-Bayes bound
minimization
- Authors: John J. Cherian, Andrew G. Taube, Robert T. McGibbon, Panagiotis
Angelikopoulos, Guy Blanc, Michael Snarski, Daniel D. Richman, John L.
Klepeis, David E. Shaw
- Abstract summary: We present an alternative objective that is equivalent to a Probably Approximately Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error.
We then devise an efficient gradient-based algorithm to minimize this objective.
- Score: 4.191847852775072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying optimal values for a high-dimensional set of hyperparameters is a
problem that has received growing attention given its importance to large-scale
machine learning applications such as neural architecture search. Recently
developed optimization methods can be used to select thousands or even millions
of hyperparameters. Such methods often yield overfit models, however, leading
to poor performance on unseen data. We argue that this overfitting results from
using the standard hyperparameter optimization objective function. Here we
present an alternative objective that is equivalent to a Probably Approximately
Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error. We then
devise an efficient gradient-based algorithm to minimize this objective; the
proposed method has asymptotic space and time complexity equal to or better
than other gradient-based hyperparameter optimization methods. We show that
this new method significantly reduces out-of-sample error when applied to
hyperparameter optimization problems known to be prone to overfitting.
Related papers
- HomOpt: A Homotopy-Based Hyperparameter Optimization Method [10.11271414863925]
We propose HomOpt, a data-driven approach based on a generalized additive model (GAM) surrogate combined with homotopy optimization.
We show how HomOpt can boost the performance and effectiveness of any given method with faster convergence to the optimum on continuous discrete, and categorical domain spaces.
arXiv Detail & Related papers (2023-08-07T06:01:50Z) - A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization
Method [0.0]
We propose a gradient-based bilevel method for solving the hyperparameter optimization problem.
We show that the proposed method converges with lower computation and leads to models that generalize better on the testing set.
arXiv Detail & Related papers (2022-08-25T14:25:16Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Online Hyperparameter Meta-Learning with Hypergradient Distillation [59.973770725729636]
gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization.
We propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation.
arXiv Detail & Related papers (2021-10-06T05:14:53Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Hyper-parameter estimation method with particle swarm optimization [0.8883733362171032]
The PSO method cannot be directly used in the problem of hyper- parameters estimation.
The proposed method uses the swarm method to optimize the performance of the acquisition function.
The results on several problems are improved.
arXiv Detail & Related papers (2020-11-24T07:51:51Z) - A Gradient-based Bilevel Optimization Approach for Tuning
Hyperparameters in Machine Learning [0.0]
We propose a bilevel solution method for solving the hyperparameter optimization problem.
The proposed method is general and can be easily applied to any class of machine learning algorithms.
We discuss the theory behind the proposed algorithm and perform extensive computational study on two datasets.
arXiv Detail & Related papers (2020-07-21T18:15:08Z) - Online Hyperparameter Search Interleaved with Proximal Parameter Updates [9.543667840503739]
We develop a method that relies on the structure of proximal gradient methods and does not require a smooth cost function.
Such a method is applied to Leave-one-out (LOO)-validated Lasso and Group Lasso.
Numerical experiments corroborate the convergence of the proposed method to a local optimum of the LOO validation error curve.
arXiv Detail & Related papers (2020-04-06T15:54:03Z) - Cross Entropy Hyperparameter Optimization for Constrained Problem
Hamiltonians Applied to QAOA [68.11912614360878]
Hybrid quantum-classical algorithms such as Quantum Approximate Optimization Algorithm (QAOA) are considered as one of the most encouraging approaches for taking advantage of near-term quantum computers in practical applications.
Such algorithms are usually implemented in a variational form, combining a classical optimization method with a quantum machine to find good solutions to an optimization problem.
In this study we apply a Cross-Entropy method to shape this landscape, which allows the classical parameter to find better parameters more easily and hence results in an improved performance.
arXiv Detail & Related papers (2020-03-11T13:52:41Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.