YAHPO Gym -- Design Criteria and a new Multifidelity Benchmark for
Hyperparameter Optimization
- URL: http://arxiv.org/abs/2109.03670v1
- Date: Wed, 8 Sep 2021 14:16:31 GMT
- Title: YAHPO Gym -- Design Criteria and a new Multifidelity Benchmark for
Hyperparameter Optimization
- Authors: Florian Pfisterer, Lennart Schneider, Julia Moosbauer, Martin Binder,
Bernd Bischl
- Abstract summary: We present a new surrogate-based benchmark suite for multifidelity HPO methods consisting of 9 benchmark collections that constitute over 700 multifidelity HPO problems in total.
All our benchmarks also allow for querying of multiple optimization targets, enabling the benchmarking of multi-objective HPO.
- Score: 1.0718353079920009
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When developing and analyzing new hyperparameter optimization (HPO) methods,
it is vital to empirically evaluate and compare them on well-curated benchmark
suites. In this work, we list desirable properties and requirements for such
benchmarks and propose a new set of challenging and relevant multifidelity HPO
benchmark problems motivated by these requirements. For this, we revisit the
concept of surrogate-based benchmarks and empirically compare them to more
widely-used tabular benchmarks, showing that the latter ones may induce bias in
performance estimation and ranking of HPO methods. We present a new
surrogate-based benchmark suite for multifidelity HPO methods consisting of 9
benchmark collections that constitute over 700 multifidelity HPO problems in
total. All our benchmarks also allow for querying of multiple optimization
targets, enabling the benchmarking of multi-objective HPO. We examine and
compare our benchmark suite with respect to the defined requirements and show
that our benchmarks provide viable additions to existing suites.
Related papers
- Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations.
Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations.
We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z) - LMEMs for post-hoc analysis of HPO Benchmarking [38.39259273088395]
We apply Linear Mixed-Effect Models-based (LMEMs) significance testing for post-hoc analysis of HPO benchmarking runs.
LMEMs allow flexible and expressive modeling on the entire experiment data, including information such as benchmark meta-features.
We demonstrate this through a case study on the PriorBand paper's experiment data to find insights not reported in the original work.
arXiv Detail & Related papers (2024-08-05T15:03:19Z) - Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization Regime [59.27851754647913]
Predictive optimization is the precise modeling of many real-world applications, including energy cost-aware scheduling and budget allocation on advertising.
We develop a modular framework to benchmark 11 existing PtO/PnO methods on 8 problems, including a new industrial dataset for advertising.
Our study shows that PnO approaches are better than PtO on 7 out of 8 benchmarks, but there is no silver bullet found for the specific design choices of PnO.
arXiv Detail & Related papers (2023-11-13T13:19:34Z) - Obeying the Order: Introducing Ordered Transfer Hyperparameter
Optimisation [10.761476482982077]
OTHPO is a version of transfer learning where the tasks follow a sequential order.
We empirically show the importance of taking order into account using ten benchmarks.
We open source the benchmarks to foster future research on ordered transfer HPO.
arXiv Detail & Related papers (2023-06-29T13:08:36Z) - FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization [50.12374973760274]
We propose and implement a benchmark suite FedHPO-B that incorporates comprehensive FL tasks, enables efficient function evaluations, and eases continuing extensions.
We also conduct extensive experiments based on FedHPO-B to benchmark a few HPO methods.
arXiv Detail & Related papers (2022-06-08T15:29:10Z) - QAFactEval: Improved QA-Based Factual Consistency Evaluation for
Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance.
Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z) - A survey on multi-objective hyperparameter optimization algorithms for
Machine Learning [62.997667081978825]
This article presents a systematic survey of the literature published between 2014 and 2020 on multi-objective HPO algorithms.
We distinguish between metaheuristic-based algorithms, metamodel-based algorithms, and approaches using a mixture of both.
We also discuss the quality metrics used to compare multi-objective HPO procedures and present future research directions.
arXiv Detail & Related papers (2021-11-23T10:22:30Z) - HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems
for HPO [30.89560505052524]
We propose HPOBench, which includes 7 existing and 5 new benchmark families, with in total more than 100 multi-fidelity benchmark problems.
HPOBench allows to run this extendable set of multi-fidelity HPO benchmarks in a reproducible way by isolating and packaging the individual benchmarks in containers.
arXiv Detail & Related papers (2021-09-14T14:28:51Z) - HPO-B: A Large-Scale Reproducible Benchmark for Black-Box HPO based on
OpenML [5.735035463793008]
We present HPO-B, a large-scale benchmark for comparing HPO algorithms.
Our benchmark is assembled and preprocessed from the OpenML repository.
We detail explicit experimental protocols, splits, and evaluation measures for comparing methods for both non-transfer and transfer learning HPO.
arXiv Detail & Related papers (2021-06-11T09:18:39Z) - Do Question Answering Modeling Improvements Hold Across Benchmarks? [84.48867898593052]
We measure concurrence between 32 QA benchmarks on a set of 20 diverse modeling approaches.
Despite years of intense community focus on a small number of benchmarks, the modeling improvements studied hold broadly.
arXiv Detail & Related papers (2021-02-01T18:55:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.