Lazy Parameter Tuning and Control: Choosing All Parameters Randomly From
a Power-Law Distribution
- URL: http://arxiv.org/abs/2104.06714v5
- Date: Fri, 10 Mar 2023 12:18:38 GMT
- Title: Lazy Parameter Tuning and Control: Choosing All Parameters Randomly From
a Power-Law Distribution
- Authors: Denis Antipov, Maxim Buzdalov, Benjamin Doerr
- Abstract summary: Most evolutionary algorithms have multiple parameters and their values drastically affect the performance.
We propose a lazy but effective solution, namely choosing all parameter values in each iteration randomly from a suitably scaled power-law distribution.
We prove a performance guarantee that is comparable to the best performance known for static parameters.
- Score: 8.34061303235504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most evolutionary algorithms have multiple parameters and their values
drastically affect the performance. Due to the often complicated interplay of
the parameters, setting these values right for a particular problem (parameter
tuning) is a challenging task. This task becomes even more complicated when the
optimal parameter values change significantly during the run of the algorithm
since then a dynamic parameter choice (parameter control) is necessary.
In this work, we propose a lazy but effective solution, namely choosing all
parameter values (where this makes sense) in each iteration randomly from a
suitably scaled power-law distribution. To demonstrate the effectiveness of
this approach, we perform runtime analyses of the $(1+(\lambda,\lambda))$
genetic algorithm with all three parameters chosen in this manner. We show that
this algorithm on the one hand can imitate simple hill-climbers like the
$(1+1)$ EA, giving the same asymptotic runtime on problems like OneMax,
LeadingOnes, or Minimum Spanning Tree. On the other hand, this algorithm is
also very efficient on jump functions, where the best static parameters are
very different from those necessary to optimize simple problems. We prove a
performance guarantee that is comparable to the best performance known for
static parameters. For the most interesting case that the jump size $k$ is
constant, we prove that our performance is asymptotically better than what can
be obtained with any static parameter choice. We complement our theoretical
results with a rigorous empirical study confirming what the asymptotic runtime
results suggest.
Related papers
- A Multi-objective Newton Optimization Algorithm for Hyper-Parameter
Search [0.0]
The algorithm is applied to search the optimal probability threshold (a vector of eight parameters) for a multiclass object detection problem of a convolutional neural network.
The algorithm produces an overall higher true positive (TP) and lower false positive (FP) rates, as compared to using the default value of 0.5.
arXiv Detail & Related papers (2024-01-07T21:12:34Z) - Optimization using Parallel Gradient Evaluations on Multiple Parameters [51.64614793990665]
We propose a first-order method for convex optimization, where gradients from multiple parameters can be used during each step of gradient descent.
Our method uses gradients from multiple parameters in synergy to update these parameters together towards the optima.
arXiv Detail & Related papers (2023-02-06T23:39:13Z) - On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks.
We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them.
Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Optimal Static Mutation Strength Distributions for the $(1+\lambda)$
Evolutionary Algorithm on OneMax [1.0965065178451106]
We show that, for large enough population sizes, such optimal distributions may be surprisingly complicated and counter-intuitive.
We show that, for large enough population sizes, such optimal distributions may be surprisingly complicated and counter-intuitive.
arXiv Detail & Related papers (2021-02-09T16:56:25Z) - Parameters for the best convergence of an optimization algorithm
On-The-Fly [0.0]
This research was done in an experimental concept in which five different algorithms were tested with different objective functions.
To find the correct parameter a method called 'on-the-fly' was applied.
arXiv Detail & Related papers (2020-09-23T21:38:28Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Weighted Random Search for Hyperparameter Optimization [0.0]
We introduce an improved version of Random Search (RS), used here for hyper parameter optimization of machine learning algorithms.
We generate new values for each hyper parameter with a probability of change, unlike the standard RS.
Within the same computational budget, our method yields better results than the standard RS.
arXiv Detail & Related papers (2020-04-03T15:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.