Towards Realistic Optimization Benchmarks: A Questionnaire on the
Properties of Real-World Problems
- URL: http://arxiv.org/abs/2004.06395v1
- Date: Tue, 14 Apr 2020 10:04:38 GMT
- Title: Towards Realistic Optimization Benchmarks: A Questionnaire on the
Properties of Real-World Problems
- Authors: Koen van der Blom, Timo M. Deist, Tea Tu\v{s}ar, Mariapia Marchi,
Yusuke Nojima, Akira Oyama, Vanessa Volz, Boris Naujoks
- Abstract summary: This work aims to identify properties of real-world problems through a questionnaire.
A few challenges that have to be considered in the design of realistic benchmarks can already be identified.
A key point for future work is to gather more responses to the questionnaire.
- Score: 2.805617945875364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benchmarks are a useful tool for empirical performance comparisons. However,
one of the main shortcomings of existing benchmarks is that it remains largely
unclear how they relate to real-world problems. What does an algorithm's
performance on a benchmark say about its potential on a specific real-world
problem? This work aims to identify properties of real-world problems through a
questionnaire on real-world single-, multi-, and many-objective optimization
problems. Based on initial responses, a few challenges that have to be
considered in the design of realistic benchmarks can already be identified. A
key point for future work is to gather more responses to the questionnaire to
allow an analysis of common combinations of properties. In turn, such common
combinations can then be included in improved benchmark suites. To gather more
data, the reader is invited to participate in the questionnaire at:
https://tinyurl.com/opt-survey
Related papers
- RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions [52.33835101586687]
Conversational AI agents use Retrieval Augmented Generation (RAG) to provide verifiable document-grounded responses to user inquiries.
This paper presents a novel synthetic data generation method to efficiently create a diverse set of context-grounded confusing questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - TVBench: Redesigning Video-Language Evaluation [48.71203934876828]
We show that the currently most used video-language benchmarks can be solved without requiring much temporal reasoning.
We propose TVBench, a novel open-source video multiple-choice question-answering benchmark.
arXiv Detail & Related papers (2024-10-10T09:28:36Z) - Do Text-to-Vis Benchmarks Test Real Use of Visualisations? [11.442971909006657]
This paper investigates whether benchmarks reflect real-world use through an empirical study comparing benchmark datasets with code from public repositories.
Our findings reveal a substantial gap, with evaluations not testing the same distribution of chart types, attributes, and actions as real-world examples.
One dataset is representative, but requires extensive modification to become a practical end-to-end benchmark.
This shows that new benchmarks are needed to support the development of systems that truly address users' visualisation needs.
arXiv Detail & Related papers (2024-07-29T06:13:28Z) - Deep Learning-Based Object Pose Estimation: A Comprehensive Survey [73.74933379151419]
We discuss the recent advances in deep learning-based object pose estimation.
Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks.
arXiv Detail & Related papers (2024-05-13T14:44:22Z) - Benchmarking Video Frame Interpolation [11.918489436283748]
We present a benchmark which establishes consistent error metrics by utilizing a submission website that computes them.
We also present a test set adhering to the assumption of linearity by utilizing synthetic data, and evaluate the computational efficiency in a coherent manner.
arXiv Detail & Related papers (2024-03-25T19:13:12Z) - Building Interpretable and Reliable Open Information Retriever for New
Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA)
We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query.
We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - A Theoretically Grounded Benchmark for Evaluating Machine Commonsense [6.725087407394836]
Theoretically-answered Commonsense Reasoning (TG-CSR) is based on discriminative question answering, but with questions designed to evaluate diverse aspects of commonsense.
TG-CSR is based on a subset of commonsense categories first proposed as a viable theory of commonsense by Gordon and Hobbs.
Preliminary results suggest that the benchmark is challenging even for advanced language representation models designed for discriminative CSR question answering tasks.
arXiv Detail & Related papers (2022-03-23T04:06:01Z) - A Complementarity Analysis of the COCO Benchmark Problems and
Artificially Generated Problems [0.0]
In this paper, one such single-objective continuous problem generation approach is analyzed and compared with the COCO benchmark problem set.
We show that such representations allow us to further explore the relations between the problems by applying visualization and correlation analysis techniques.
arXiv Detail & Related papers (2021-04-27T09:18:43Z) - Identifying Properties of Real-World Optimisation Problems through a
Questionnaire [2.805617945875364]
This work investigates the properties of real-world problems through a questionnaire.
It enables the design of future benchmark problems that more closely resemble those found in the real world.
arXiv Detail & Related papers (2020-11-11T05:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.