On the Assessment of Benchmark Suites for Algorithm Comparison
- URL: http://arxiv.org/abs/2104.07381v1
- Date: Thu, 15 Apr 2021 11:20:11 GMT
- Title: On the Assessment of Benchmark Suites for Algorithm Comparison
- Authors: David Issa Mattos, Lucas Ruud, Jan Bosch, Helena Holmstr\"om Olsson
- Abstract summary: We show that most benchmark functions of BBOB suite have high difficulty levels (compared to the optimization algorithms) and low discrimination.
We discuss potential uses of IRT in benchmarking, including its use to improve the design of benchmark suites.
- Score: 7.501426386641256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benchmark suites, i.e. a collection of benchmark functions, are widely used
in the comparison of black-box optimization algorithms. Over the years,
research has identified many desired qualities for benchmark suites, such as
diverse topology, different difficulties, scalability, representativeness of
real-world problems among others. However, while the topology characteristics
have been subjected to previous studies, there is no study that has
statistically evaluated the difficulty level of benchmark functions, how well
they discriminate optimization algorithms and how suitable is a benchmark suite
for algorithm comparison. In this paper, we propose the use of an item response
theory (IRT) model, the Bayesian two-parameter logistic model for multiple
attempts, to statistically evaluate these aspects with respect to the empirical
success rate of algorithms. With this model, we can assess the difficulty level
of each benchmark, how well they discriminate different algorithms, the ability
score of an algorithm, and how much information the benchmark suite adds in the
estimation of the ability scores. We demonstrate the use of this model in two
well-known benchmark suites, the Black-Box Optimization Benchmark (BBOB) for
continuous optimization and the Pseudo Boolean Optimization (PBO) for discrete
optimization. We found that most benchmark functions of BBOB suite have high
difficulty levels (compared to the optimization algorithms) and low
discrimination. For the PBO, most functions have good discrimination parameters
but are often considered too easy. We discuss potential uses of IRT in
benchmarking, including its use to improve the design of benchmark suites, to
measure multiple aspects of the algorithms, and to design adaptive suites.
Related papers
- Towards Robust Benchmarking of Quantum Optimization Algorithms [3.9456729020535013]
A key problem in existing benchmarking frameworks is the lack of equal effort in optimizing for the best quantum and, respectively, classical approaches.
This paper presents a comprehensive set of guidelines comprising universal steps towards fair benchmarks.
arXiv Detail & Related papers (2024-05-13T10:35:23Z) - Prasatul Matrix: A Direct Comparison Approach for Analyzing Evolutionary
Optimization Algorithms [2.1320960069210475]
Direct comparison approach is proposed to analyze the performance of evolutionary optimization algorithms.
Five different performance measures are designed based on the prasatul matrix to evaluate the performance of algorithms.
arXiv Detail & Related papers (2022-12-01T17:21:44Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - EXPObench: Benchmarking Surrogate-based Optimisation Algorithms on
Expensive Black-box Functions [4.8980686156238535]
We provide an extensive comparison of six different surrogate algorithms on four expensive optimisation problems from different real-life applications.
This has led to new insights regarding the relative importance of exploration, the evaluation time of the objective, and the used model.
We make the algorithms and benchmark problem instances publicly available, contributing to more uniform analysis of surrogate algorithms.
arXiv Detail & Related papers (2021-06-08T18:17:42Z) - Bayesian Algorithm Execution: Estimating Computable Properties of
Black-box Functions Using Mutual Information [78.78486761923855]
In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations.
We present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output.
On these problems, InfoBAX uses up to 500 times fewer queries to f than required by the original algorithm.
arXiv Detail & Related papers (2021-04-19T17:22:11Z) - Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods.
We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms.
We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Black-Box Optimization Revisited: Improving Algorithm Selection Wizards
through Massive Benchmarking [8.874754363200614]
Existing studies in black-box optimization for machine learning suffer from low generalizability.
We propose a benchmark suite, OptimSuite, which covers a broad range of black-box optimization problems.
ABBO achieves competitive performance on all benchmark suites.
arXiv Detail & Related papers (2020-10-08T14:17:30Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.