Related papers: Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms

Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms

URL: http://arxiv.org/abs/2204.09353v2
Date: Fri, 22 Apr 2022 10:11:05 GMT
Title: Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms
Authors: Diederick Vermetten and Hao Wang and Manuel L\'opez-Iba\~nez and Carola Doerr and Thomas B\"ack
Abstract summary: We show that care should be taken when making decisions based on limited data. We show examples of performance losses of more than 20%, even when using statistical races to dynamically adjust the number of runs.
Score: 3.967483941966979
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The stochastic nature of iterative optimization heuristics leads to inherently noisy performance measurements. Since these measurements are often gathered once and then used repeatedly, the number of collected samples will have a significant impact on the reliability of algorithm comparisons. We show that care should be taken when making decisions based on limited data. Particularly, we show that the number of runs used in many benchmarking studies, e.g., the default value of 15 suggested by the COCO environment, can be insufficient to reliably rank algorithms on well-known numerical optimization benchmarks. Additionally, methods for automated algorithm configuration are sensitive to insufficient sample sizes. This may result in the configurator choosing a `lucky' but poor-performing configuration despite exploring better ones. We show that relying on mean performance values, as many configurators do, can require a large number of runs to provide accurate comparisons between the considered configurations. Common statistical tests can greatly improve the situation in most cases but not always. We show examples of performance losses of more than 20%, even when using statistical races to dynamically adjust the number of runs, as done by irace. Our results underline the importance of appropriately considering the statistical distribution of performance values.

Related papers

A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers: How Many Repeats Are Enough? [0.8575004906002217]
We present a statistical analysis of the common metrics, and develop guidelines for experiment design. We derive a lower bound on the number of repeats in order to guarantee achieving a given accuracy in the metrics. We propose an algorithm to adaptively adjust the number of repeats needed to ensure the accuracy of the evaluated metric.
arXiv Detail & Related papers (2025-03-20T17:38:50Z)
Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
Scaling LLM Inference with Optimized Sample Compute Allocation [56.524278187351925]
We propose OSCA, an algorithm to find an optimal mix of different inference configurations. Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration. OSCA is also shown to be effective in agentic beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.
arXiv Detail & Related papers (2024-10-29T19:17:55Z)
Gradient Descent Efficiency Index [0.0]
This study introduces a new efficiency metric, Ek, designed to quantify the effectiveness of each iteration. The proposed metric accounts for both the relative change in error and the stability of the loss function across iterations. Ek has the potential to guide more informed decisions in the selection and tuning of optimization algorithms in machine learning applications.
arXiv Detail & Related papers (2024-10-25T10:22:22Z)
FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs [92.47146416628965]
FuzzyFlow is a fault localization and test case extraction framework designed to test program optimizations. We leverage dataflow program representations to capture a fully reproducible system state and area-of-effect for optimizations. To reduce testing time, we design an algorithm for minimizing test inputs, trading off memory for recomputation.
arXiv Detail & Related papers (2023-06-28T13:00:17Z)
Defining Standard Strategies for Quantum Benchmarks [0.1759008116536278]
We define a set of characteristics that any benchmark should follow, and make a distinction between benchmarks and diagnostics. We discuss the issue of benchmark optimizations, detail when those optimizations are appropriate, and how they should be reported. We introduce a scalable mirror quantum volume benchmark.
arXiv Detail & Related papers (2023-03-03T17:50:34Z)
Selection of the Most Probable Best [2.1095005405219815]
We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution. We devise a series of algorithms that replace the unknown means in the optimality conditions with their estimates and prove the algorithms' sampling ratios achieve the conditions as the simulation budget increases.
arXiv Detail & Related papers (2022-07-15T15:27:27Z)
Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items. We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons. We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z)
Automated Configuration of Genetic Algorithms by Tuning for Anytime Performance [4.33419118449588]
We show that it might be preferable to use anytime performance measures for the configuration task. tuning for expected running time is much more sensitive with respect to the budget that is allocated to the target algorithms.
arXiv Detail & Related papers (2021-06-11T10:44:51Z)
On the Assessment of Benchmark Suites for Algorithm Comparison [7.501426386641256]
We show that most benchmark functions of BBOB suite have high difficulty levels (compared to the optimization algorithms) and low discrimination. We discuss potential uses of IRT in benchmarking, including its use to improve the design of benchmark suites.
arXiv Detail & Related papers (2021-04-15T11:20:11Z)
A Framework for Sample Efficient Interval Estimation with Control Variates [94.32811054797148]
We consider the problem of estimating confidence intervals for the mean of a random variable. Under certain conditions, we show improved efficiency compared to existing estimation algorithms.
arXiv Detail & Related papers (2020-06-18T05:42:30Z)
Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.