MementoML: Performance of selected machine learning algorithm
configurations on OpenML100 datasets
- URL: http://arxiv.org/abs/2008.13162v1
- Date: Sun, 30 Aug 2020 13:13:52 GMT
- Title: MementoML: Performance of selected machine learning algorithm
configurations on OpenML100 datasets
- Authors: Wojciech Kretowicz, Przemys{\l}aw Biecek
- Abstract summary: We present the protocol of generating benchmark data describing the performance of different ML algorithms.
Data collected in this way is used to study the factors influencing the algorithm's performance.
- Score: 5.802346990263708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finding optimal hyperparameters for the machine learning algorithm can often
significantly improve its performance. But how to choose them in a
time-efficient way? In this paper we present the protocol of generating
benchmark data describing the performance of different ML algorithms with
different hyperparameter configurations. Data collected in this way is used to
study the factors influencing the algorithm's performance.
This collection was prepared for the purposes of the study presented in the
EPP study. We tested algorithms performance on dense grid of hyperparameters.
Tested datasets and hyperparameters were chosen before any algorithm has run
and were not changed. This is a different approach than the one usually used in
hyperparameter tuning, where the selection of candidate hyperparameters depends
on the results obtained previously. However, such selection allows for
systematic analysis of performance sensitivity from individual hyperparameters.
This resulted in a comprehensive dataset of such benchmarks that we would
like to share. We hope, that computed and collected result may be helpful for
other researchers. This paper describes the way data was collected. Here you
can find benchmarks of 7 popular machine learning algorithms on 39 OpenML
datasets.
The detailed data forming this benchmark are available at:
https://www.kaggle.com/mi2datalab/mementoml.
Related papers
- Data Classification With Multiprocessing [6.513930657238705]
Python multiprocessing is used to test this hypothesis with different classification algorithms.
We conclude that ensembling improves accuracy and multiprocessing reduces execution time for selected algorithms.
arXiv Detail & Related papers (2023-12-23T03:42:13Z) - Massively Parallel Genetic Optimization through Asynchronous Propagation
of Populations [50.591267188664666]
Propulate is an evolutionary optimization algorithm and software package for global optimization.
We provide an MPI-based implementation of our algorithm, which features variants of selection, mutation, crossover, and migration.
We find that Propulate is up to three orders of magnitude faster without sacrificing solution accuracy.
arXiv Detail & Related papers (2023-01-20T18:17:34Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient
Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning.
We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - A Comparative study of Hyper-Parameter Optimization Tools [2.6097538974670935]
We compare the performance of four python libraries, namely Optuna, Hyperopt, Optunity, and sequential model algorithm configuration (SMAC)
We found that Optuna has better performance for CASH problem and NeurIPS black-box optimization challenge.
arXiv Detail & Related papers (2022-01-17T14:49:36Z) - Experimental Investigation and Evaluation of Model-based Hyperparameter
Optimization [0.3058685580689604]
This article presents an overview of theoretical and practical results for popular machine learning algorithms.
The R package mlr is used as a uniform interface to the machine learning models.
arXiv Detail & Related papers (2021-07-19T11:37:37Z) - Meta-Learning for Symbolic Hyperparameter Defaults [2.928016570228877]
Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data.
We propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset.
This enables a much faster, but still data-dependent, configuration of the ML algorithm.
arXiv Detail & Related papers (2021-06-10T14:20:28Z) - How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency.
A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search.
We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z) - How to tune the RBF SVM hyperparameters?: An empirical evaluation of 18
search algorithms [4.394728504061753]
We propose 18 proposed search algorithms for 115 real-life binary data sets.
We find that Parss better searches with only a slight increase in time with respect to the same tree with with respect to the grid.
We also find that there are no significant differences among the different procedures to the best set of data when more than one is found by the search algorithms.
arXiv Detail & Related papers (2020-08-26T16:28:48Z) - New Oracle-Efficient Algorithms for Private Synthetic Data Release [52.33506193761153]
We present three new algorithms for constructing differentially private synthetic data.
The algorithms satisfy differential privacy even in the worst case.
Compared to the state-of-the-art method High-Dimensional Matrix Mechanism citeMcKennaMHM18, our algorithms provide better accuracy in the large workload.
arXiv Detail & Related papers (2020-07-10T15:46:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.