SMOOTHIE: A Theory of Hyper-parameter Optimization for Software
Analytics
- URL: http://arxiv.org/abs/2401.09622v1
- Date: Wed, 17 Jan 2024 22:23:29 GMT
- Title: SMOOTHIE: A Theory of Hyper-parameter Optimization for Software
Analytics
- Authors: Rahul Yedida and Tim Menzies
- Abstract summary: This paper implements and tests SMOOTHIE, a novel hyper- parameter that guides its optimizations via considerations of smothness''
Experiments include GitHub issue lifetime prediction, detecting false alarms in static code warnings, and defect prediction.
Better yet, SMOOTHIE ran 300% faster than the prior state-of-the arts.
- Score: 14.0078949388954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hyper-parameter optimization is the black art of tuning a learner's control
parameters. In software analytics, a repeated result is that such tuning can
result in dramatic performance improvements. Despite this, hyper-parameter
optimization is often applied rarely or poorly in software analytics--perhaps
due to the CPU cost of exploring all those parameter options can be
prohibitive.
We theorize that learners generalize better when the loss landscape is
``smooth''. This theory is useful since the influence on ``smoothness'' of
different hyper-parameter choices can be tested very quickly (e.g. for a deep
learner, after just one epoch).
To test this theory, this paper implements and tests SMOOTHIE, a novel
hyper-parameter optimizer that guides its optimizations via considerations of
``smothness''. The experiments of this paper test SMOOTHIE on numerous SE tasks
including (a) GitHub issue lifetime prediction; (b) detecting false alarms in
static code warnings; (c) defect prediction, and (d) a set of standard ML
datasets. In all these experiments, SMOOTHIE out-performed state-of-the-art
optimizers. Better yet, SMOOTHIE ran 300% faster than the prior state-of-the
art. We hence conclude that this theory (that hyper-parameter optimization is
best viewed as a ``smoothing'' function for the decision landscape), is both
theoretically interesting and practically very useful.
To support open science and other researchers working in this area, all our
scripts and datasets are available on-line at
https://github.com/yrahul3910/smoothness-hpo/.
Related papers
- Combining Automated Optimisation of Hyperparameters and Reward Shape [7.407166175374958]
We propose a methodology for the combined optimisation of hyperparameters and the reward function.
We conducted extensive experiments using Proximal Policy optimisation and Soft Actor-Critic.
Our results show that combined optimisation significantly improves over baseline performance in half of the environments and achieves competitive performance in the others.
arXiv Detail & Related papers (2024-06-26T12:23:54Z) - Should I try multiple optimizers when fine-tuning pre-trained
Transformers for NLP tasks? Should I tune their hyperparameters? [14.349943044268471]
Gradient Descent (SGD) is employed to select neural models for training.
tuning just the learning rate is in most cases as good as tuning all the hyperparameters.
We recommend picking any of the best-behaved adaptiveBounds (e.g., Adam) and recommending its learning rate.
arXiv Detail & Related papers (2024-02-10T13:26:14Z) - MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
We introduce Meta-Adaptives (MADA), a unified framework that can generalize several known convergences and dynamically learn the most suitable one during training.
We empirically compare MADA to other populars on vision and language tasks, and find that MADA consistently outperforms Adam and other populars.
We also propose AVGrad, a modification of AMS that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization.
arXiv Detail & Related papers (2024-01-17T00:16:46Z) - Stochastic Optimal Control Matching [53.156277491861985]
Our work introduces Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for optimal control.
The control is learned via a least squares problem by trying to fit a matching vector field.
Experimentally, our algorithm achieves lower error than all the existing IDO techniques for optimal control.
arXiv Detail & Related papers (2023-12-04T16:49:43Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z) - Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free
Optimization Algorithms [0.6543507682026964]
Deep learning (DL) applications are built using DL libraries and frameworks such as Genetic and PyTorch.
These frameworks have complex parameters and tuning them to obtain good training and inference performance is challenging for typical users.
In this paper, we treat the problem of tuning parameters of DL frameworks to improve training and inference performance as a black-box problem.
arXiv Detail & Related papers (2021-09-13T19:10:23Z) - Experimental Investigation and Evaluation of Model-based Hyperparameter
Optimization [0.3058685580689604]
This article presents an overview of theoretical and practical results for popular machine learning algorithms.
The R package mlr is used as a uniform interface to the machine learning models.
arXiv Detail & Related papers (2021-07-19T11:37:37Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.