Related papers: Lazy Parameter Tuning and Control: Choosing All Parameters Randomly From a Power-Law Distribution

Lazy Parameter Tuning and Control: Choosing All Parameters Randomly From a Power-Law Distribution

URL: http://arxiv.org/abs/2104.06714v5
Date: Fri, 10 Mar 2023 12:18:38 GMT
Title: Lazy Parameter Tuning and Control: Choosing All Parameters Randomly From a Power-Law Distribution
Authors: Denis Antipov, Maxim Buzdalov, Benjamin Doerr
Abstract summary: Most evolutionary algorithms have multiple parameters and their values drastically affect the performance. We propose a lazy but effective solution, namely choosing all parameter values in each iteration randomly from a suitably scaled power-law distribution. We prove a performance guarantee that is comparable to the best performance known for static parameters.
Score: 8.34061303235504
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most evolutionary algorithms have multiple parameters and their values drastically affect the performance. Due to the often complicated interplay of the parameters, setting these values right for a particular problem (parameter tuning) is a challenging task. This task becomes even more complicated when the optimal parameter values change significantly during the run of the algorithm since then a dynamic parameter choice (parameter control) is necessary. In this work, we propose a lazy but effective solution, namely choosing all parameter values (where this makes sense) in each iteration randomly from a suitably scaled power-law distribution. To demonstrate the effectiveness of this approach, we perform runtime analyses of the $(1+(\lambda,\lambda))$ genetic algorithm with all three parameters chosen in this manner. We show that this algorithm on the one hand can imitate simple hill-climbers like the $(1+1)$ EA, giving the same asymptotic runtime on problems like OneMax, LeadingOnes, or Minimum Spanning Tree. On the other hand, this algorithm is also very efficient on jump functions, where the best static parameters are very different from those necessary to optimize simple problems. We prove a performance guarantee that is comparable to the best performance known for static parameters. For the most interesting case that the jump size $k$ is constant, we prove that our performance is asymptotically better than what can be obtained with any static parameter choice. We complement our theoretical results with a rigorous empirical study confirming what the asymptotic runtime results suggest.

Related papers

Enhancing Parameter Control Policies with State Information [0.44241702149260353]
We propose four new benchmarks for which we derive optimal or close-to-optimal control policies.<n>We consider how additional information about the current state of the algorithm can help to make better choices of parameters.
arXiv Detail & Related papers (2025-07-11T07:31:48Z)
A Multi-objective Newton Optimization Algorithm for Hyper-Parameter Search [0.0]
The algorithm is applied to search the optimal probability threshold (a vector of eight parameters) for a multiclass object detection problem of a convolutional neural network. The algorithm produces an overall higher true positive (TP) and lower false positive (FP) rates, as compared to using the default value of 0.5.
arXiv Detail & Related papers (2024-01-07T21:12:34Z)
Optimization using Parallel Gradient Evaluations on Multiple Parameters [51.64614793990665]
We propose a first-order method for convex optimization, where gradients from multiple parameters can be used during each step of gradient descent. Our method uses gradients from multiple parameters in synergy to update these parameters together towards the optima.
arXiv Detail & Related papers (2023-02-06T23:39:13Z)
On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z)
STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z)
Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication. We prove that preconditioning has an additional benefit that has been previously unexplored. It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z)
Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG) Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem. Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z)
Optimal Static Mutation Strength Distributions for the $(1+\lambda)$ Evolutionary Algorithm on OneMax [1.0965065178451106]
We show that, for large enough population sizes, such optimal distributions may be surprisingly complicated and counter-intuitive. We show that, for large enough population sizes, such optimal distributions may be surprisingly complicated and counter-intuitive.
arXiv Detail & Related papers (2021-02-09T16:56:25Z)
Parameters for the best convergence of an optimization algorithm On-The-Fly [0.0]
This research was done in an experimental concept in which five different algorithms were tested with different objective functions. To find the correct parameter a method called 'on-the-fly' was applied.
arXiv Detail & Related papers (2020-09-23T21:38:28Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
Weighted Random Search for Hyperparameter Optimization [0.0]
We introduce an improved version of Random Search (RS), used here for hyper parameter optimization of machine learning algorithms. We generate new values for each hyper parameter with a probability of change, unlike the standard RS. Within the same computational budget, our method yields better results than the standard RS.
arXiv Detail & Related papers (2020-04-03T15:41:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.