Hyper-parameter estimation method with particle swarm optimization
- URL: http://arxiv.org/abs/2011.11944v2
- Date: Mon, 14 Dec 2020 04:16:34 GMT
- Title: Hyper-parameter estimation method with particle swarm optimization
- Authors: Yaru Li, Yulai Zhang
- Abstract summary: The PSO method cannot be directly used in the problem of hyper- parameters estimation.
The proposed method uses the swarm method to optimize the performance of the acquisition function.
The results on several problems are improved.
- Score: 0.8883733362171032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Particle swarm optimization (PSO) method cannot be directly used in the
problem of hyper-parameter estimation since the mathematical formulation of the
mapping from hyper-parameters to loss function or generalization accuracy is
unclear. Bayesian optimization (BO) framework is capable of converting the
optimization of the hyper-parameters into the optimization of an acquisition
function. The acquisition function is non-convex and multi-peak. So the problem
can be better solved by the PSO. The proposed method in this paper uses the
particle swarm method to optimize the acquisition function in the BO framework
to get better hyper-parameters. The performances of proposed method in both of
the classification and regression models are evaluated and demonstrated. The
results on several benchmark problems are improved.
Related papers
- How to Prove the Optimized Values of Hyperparameters for Particle Swarm
Optimization? [0.0]
This study proposes an analytic framework to analyze the optimized average-fitness-function-value (AFFV) based on mathematical models for a variety of fitness functions.
Experimental results show that the hyper parameter values from the proposed method can obtain higher efficiency convergences and lower AFFVs.
arXiv Detail & Related papers (2023-02-01T00:33:35Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization
Method [0.0]
We propose a gradient-based bilevel method for solving the hyperparameter optimization problem.
We show that the proposed method converges with lower computation and leads to models that generalize better on the testing set.
arXiv Detail & Related papers (2022-08-25T14:25:16Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Hyper-parameter optimization based on soft actor critic and hierarchical
mixture regularization [5.063728016437489]
We model hyper- parameter optimization process as a Markov decision process, and tackle it with reinforcement learning.
A novel hyper- parameter optimization method based on soft actor critic and hierarchical mixture regularization has been proposed.
arXiv Detail & Related papers (2021-12-08T02:34:43Z) - Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Efficient hyperparameter optimization by way of PAC-Bayes bound
minimization [4.191847852775072]
We present an alternative objective that is equivalent to a Probably Approximately Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error.
We then devise an efficient gradient-based algorithm to minimize this objective.
arXiv Detail & Related papers (2020-08-14T15:54:51Z) - Online Hyperparameter Search Interleaved with Proximal Parameter Updates [9.543667840503739]
We develop a method that relies on the structure of proximal gradient methods and does not require a smooth cost function.
Such a method is applied to Leave-one-out (LOO)-validated Lasso and Group Lasso.
Numerical experiments corroborate the convergence of the proposed method to a local optimum of the LOO validation error curve.
arXiv Detail & Related papers (2020-04-06T15:54:03Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.