SMOOTHIE: A Theory of Hyper-parameter Optimization for Software
Analytics
- URL: http://arxiv.org/abs/2401.09622v1
- Date: Wed, 17 Jan 2024 22:23:29 GMT
- Title: SMOOTHIE: A Theory of Hyper-parameter Optimization for Software
Analytics
- Authors: Rahul Yedida and Tim Menzies
- Abstract summary: This paper implements and tests SMOOTHIE, a novel hyper- parameter that guides its optimizations via considerations of smothness''
Experiments include GitHub issue lifetime prediction, detecting false alarms in static code warnings, and defect prediction.
Better yet, SMOOTHIE ran 300% faster than the prior state-of-the arts.
- Score: 14.0078949388954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hyper-parameter optimization is the black art of tuning a learner's control
parameters. In software analytics, a repeated result is that such tuning can
result in dramatic performance improvements. Despite this, hyper-parameter
optimization is often applied rarely or poorly in software analytics--perhaps
due to the CPU cost of exploring all those parameter options can be
prohibitive.
We theorize that learners generalize better when the loss landscape is
``smooth''. This theory is useful since the influence on ``smoothness'' of
different hyper-parameter choices can be tested very quickly (e.g. for a deep
learner, after just one epoch).
To test this theory, this paper implements and tests SMOOTHIE, a novel
hyper-parameter optimizer that guides its optimizations via considerations of
``smothness''. The experiments of this paper test SMOOTHIE on numerous SE tasks
including (a) GitHub issue lifetime prediction; (b) detecting false alarms in
static code warnings; (c) defect prediction, and (d) a set of standard ML
datasets. In all these experiments, SMOOTHIE out-performed state-of-the-art
optimizers. Better yet, SMOOTHIE ran 300% faster than the prior state-of-the
art. We hence conclude that this theory (that hyper-parameter optimization is
best viewed as a ``smoothing'' function for the decision landscape), is both
theoretically interesting and practically very useful.
To support open science and other researchers working in this area, all our
scripts and datasets are available on-line at
https://github.com/yrahul3910/smoothness-hpo/.
Related papers
- Be aware of overfitting by hyperparameter optimization! [0.0]
We show that hyperparameter optimization did not always result in better models, possibly due to overfitting when using the same statistical measures.
We also extended the previous analysis by adding a representation learning method based on Natural Language Processing of smiles called Transformer CNN.
We show that across all analyzed sets using exactly the same protocol, Transformer CNN provided better results than graph-based methods for 26 out of 28 pairwise comparisons.
arXiv Detail & Related papers (2024-07-30T12:45:05Z) - Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic [99.3682210827572]
Vision-language models (VLMs) are trained for thousands of GPU hours on carefully curated web datasets.
Data curation strategies are typically developed agnostic of the available compute for training.
We introduce neural scaling laws that account for the non-homogeneous nature of web data.
arXiv Detail & Related papers (2024-04-10T17:27:54Z) - Learning from Very Little Data: On the Value of Landscape Analysis for
Predicting Software Project Health [13.19204187502255]
This paper only explores the application of niSNEAK to project health. That said, we see nothing in principle that prevents the application of this technique to a wider range of problems.
arXiv Detail & Related papers (2023-01-16T19:27:16Z) - Semantic Preserving Adversarial Attack Generation with Autoencoder and
Genetic Algorithm [29.613411948228563]
Little noises can fool state-of-the-art models into making incorrect predictions.
We propose a black-box attack, which modifies latent features of data extracted by an autoencoder.
We trained autoencoders on MNIST and CIFAR-10 datasets and found optimal adversarial perturbations using a genetic algorithm.
arXiv Detail & Related papers (2022-08-25T17:27:26Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Self-Supervised Neural Architecture Search for Imbalanced Datasets [129.3987858787811]
Neural Architecture Search (NAS) provides state-of-the-art results when trained on well-curated datasets with annotated labels.
We propose a NAS-based framework that bears the threefold contributions: (a) we focus on the self-supervised scenario, where no labels are required to determine the architecture, and (b) we assume the datasets are imbalanced.
arXiv Detail & Related papers (2021-09-17T14:56:36Z) - Discriminative-Generative Dual Memory Video Anomaly Detection [81.09977516403411]
Recently, people tried to use a few anomalies for video anomaly detection (VAD) instead of only normal data during the training process.
We propose a DiscRiminative-gEnerative duAl Memory (DREAM) anomaly detection model to take advantage of a few anomalies and solve data imbalance.
arXiv Detail & Related papers (2021-04-29T15:49:01Z) - MementoML: Performance of selected machine learning algorithm
configurations on OpenML100 datasets [5.802346990263708]
We present the protocol of generating benchmark data describing the performance of different ML algorithms.
Data collected in this way is used to study the factors influencing the algorithm's performance.
arXiv Detail & Related papers (2020-08-30T13:13:52Z) - How to tune the RBF SVM hyperparameters?: An empirical evaluation of 18
search algorithms [4.394728504061753]
We propose 18 proposed search algorithms for 115 real-life binary data sets.
We find that Parss better searches with only a slight increase in time with respect to the same tree with with respect to the grid.
We also find that there are no significant differences among the different procedures to the best set of data when more than one is found by the search algorithms.
arXiv Detail & Related papers (2020-08-26T16:28:48Z) - New Oracle-Efficient Algorithms for Private Synthetic Data Release [52.33506193761153]
We present three new algorithms for constructing differentially private synthetic data.
The algorithms satisfy differential privacy even in the worst case.
Compared to the state-of-the-art method High-Dimensional Matrix Mechanism citeMcKennaMHM18, our algorithms provide better accuracy in the large workload.
arXiv Detail & Related papers (2020-07-10T15:46:05Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.