YAHPO Gym -- Design Criteria and a new Multifidelity Benchmark for
Hyperparameter Optimization
- URL: http://arxiv.org/abs/2109.03670v1
- Date: Wed, 8 Sep 2021 14:16:31 GMT
- Title: YAHPO Gym -- Design Criteria and a new Multifidelity Benchmark for
Hyperparameter Optimization
- Authors: Florian Pfisterer, Lennart Schneider, Julia Moosbauer, Martin Binder,
Bernd Bischl
- Abstract summary: We present a new surrogate-based benchmark suite for multifidelity HPO methods consisting of 9 benchmark collections that constitute over 700 multifidelity HPO problems in total.
All our benchmarks also allow for querying of multiple optimization targets, enabling the benchmarking of multi-objective HPO.
- Score: 1.0718353079920009
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When developing and analyzing new hyperparameter optimization (HPO) methods,
it is vital to empirically evaluate and compare them on well-curated benchmark
suites. In this work, we list desirable properties and requirements for such
benchmarks and propose a new set of challenging and relevant multifidelity HPO
benchmark problems motivated by these requirements. For this, we revisit the
concept of surrogate-based benchmarks and empirically compare them to more
widely-used tabular benchmarks, showing that the latter ones may induce bias in
performance estimation and ranking of HPO methods. We present a new
surrogate-based benchmark suite for multifidelity HPO methods consisting of 9
benchmark collections that constitute over 700 multifidelity HPO problems in
total. All our benchmarks also allow for querying of multiple optimization
targets, enabling the benchmarking of multi-objective HPO. We examine and
compare our benchmark suite with respect to the defined requirements and show
that our benchmarks provide viable additions to existing suites.
Related papers
- Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods [49.62131719441252]
Attribution methods compute importance scores for input features to explain the output predictions of deep models.
In this work, we first identify a set of fidelity criteria that reliable benchmarks for attribution methods are expected to fulfill.
We then introduce a Backdoor-based eXplainable AI benchmark (BackX) that adheres to the desired fidelity criteria.
arXiv Detail & Related papers (2024-05-02T13:48:37Z) - Interactive Hyperparameter Optimization in Multi-Objective Problems via
Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML)
Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z) - Obeying the Order: Introducing Ordered Transfer Hyperparameter
Optimisation [10.761476482982077]
OTHPO is a version of transfer learning where the tasks follow a sequential order.
We empirically show the importance of taking order into account using ten benchmarks.
We open source the benchmarks to foster future research on ordered transfer HPO.
arXiv Detail & Related papers (2023-06-29T13:08:36Z) - A Survey of Parameters Associated with the Quality of Benchmarks in NLP [24.6240575061124]
Recent studies have shown that models triumph over several popular benchmarks just by overfitting on spurious biases, without truly learning the desired task.
A potential solution to these issues -- a metric quantifying quality -- remains underexplored.
We take the first step towards a metric by identifying certain language properties that can represent various possible interactions leading to biases in a benchmark.
arXiv Detail & Related papers (2022-10-14T06:44:14Z) - FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization [50.12374973760274]
We propose and implement a benchmark suite FedHPO-B that incorporates comprehensive FL tasks, enables efficient function evaluations, and eases continuing extensions.
We also conduct extensive experiments based on FedHPO-B to benchmark a few HPO methods.
arXiv Detail & Related papers (2022-06-08T15:29:10Z) - QAFactEval: Improved QA-Based Factual Consistency Evaluation for
Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance.
Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z) - A survey on multi-objective hyperparameter optimization algorithms for
Machine Learning [62.997667081978825]
This article presents a systematic survey of the literature published between 2014 and 2020 on multi-objective HPO algorithms.
We distinguish between metaheuristic-based algorithms, metamodel-based algorithms, and approaches using a mixture of both.
We also discuss the quality metrics used to compare multi-objective HPO procedures and present future research directions.
arXiv Detail & Related papers (2021-11-23T10:22:30Z) - HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems
for HPO [30.89560505052524]
We propose HPOBench, which includes 7 existing and 5 new benchmark families, with in total more than 100 multi-fidelity benchmark problems.
HPOBench allows to run this extendable set of multi-fidelity HPO benchmarks in a reproducible way by isolating and packaging the individual benchmarks in containers.
arXiv Detail & Related papers (2021-09-14T14:28:51Z) - HPO-B: A Large-Scale Reproducible Benchmark for Black-Box HPO based on
OpenML [5.735035463793008]
We present HPO-B, a large-scale benchmark for comparing HPO algorithms.
Our benchmark is assembled and preprocessed from the OpenML repository.
We detail explicit experimental protocols, splits, and evaluation measures for comparing methods for both non-transfer and transfer learning HPO.
arXiv Detail & Related papers (2021-06-11T09:18:39Z) - Do Question Answering Modeling Improvements Hold Across Benchmarks? [84.48867898593052]
We measure concurrence between 32 QA benchmarks on a set of 20 diverse modeling approaches.
Despite years of intense community focus on a small number of benchmarks, the modeling improvements studied hold broadly.
arXiv Detail & Related papers (2021-02-01T18:55:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.