On the role of benchmarking data sets and simulations in method
comparison studies
- URL: http://arxiv.org/abs/2208.01457v1
- Date: Tue, 2 Aug 2022 13:47:53 GMT
- Title: On the role of benchmarking data sets and simulations in method
comparison studies
- Authors: Sarah Friedrich and Tim Friede
- Abstract summary: This paper investigates differences and similarities between simulation studies and benchmarking studies.
We borrow ideas from different contexts such as mixed methods research and Clinical Scenario Evaluation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Method comparisons are essential to provide recommendations and guidance for
applied researchers, who often have to choose from a plethora of available
approaches. While many comparisons exist in the literature, these are often not
neutral but favour a novel method. Apart from the choice of design and a proper
reporting of the findings, there are different approaches concerning the
underlying data for such method comparison studies. Most manuscripts on
statistical methodology rely on simulation studies and provide a single
real-world data set as an example to motivate and illustrate the methodology
investigated. In the context of supervised learning, in contrast, methods are
often evaluated using so-called benchmarking data sets, i.e. real-world data
that serve as gold standard in the community. Simulation studies, on the other
hand, are much less common in this context. The aim of this paper is to
investigate differences and similarities between these approaches, to discuss
their advantages and disadvantages and ultimately to develop new approaches to
the evaluation of methods picking the best of both worlds. To this aim, we
borrow ideas from different contexts such as mixed methods research and
Clinical Scenario Evaluation.
Related papers
- Experimental Analysis of Large-scale Learnable Vector Storage
Compression [42.52474894105165]
Learnable embedding vector is one of the most important applications in machine learning.
The high dimensionality of sparse data in recommendation tasks and the huge volume of corpus in retrieval-related tasks lead to a large memory consumption of the embedding table.
Recent research has proposed various methods to compress the embeddings at the cost of a slight decrease in model quality or the introduction of other overheads.
arXiv Detail & Related papers (2023-11-27T07:11:47Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z) - Academics evaluating academics: a methodology to inform the review
process on top of open citations [1.911678487931003]
We explore whether citation-based metrics, calculated only considering open citation, provide data that can yield insights on how human peer-review of research assessment exercises is conducted.
We propose to use a series of machine learning models to replicate the decisions of the committees of the research assessment exercises.
arXiv Detail & Related papers (2021-06-10T13:09:15Z) - Comprehensive Comparative Study of Multi-Label Classification Methods [1.1278903078792917]
Multi-label classification (MLC) has recently received increasing interest from the machine learning community.
This work provides a comprehensive empirical study of a wide range of MLC methods on a plethora of datasets from various domains.
arXiv Detail & Related papers (2021-02-14T09:38:15Z) - A Discussion on Practical Considerations with Sparse Regression
Methodologies [0.0]
Two papers published in Statistical Science study the comparative performance of several sparse regression methodologies.
We summarize and compare the two studies and aim to provide clarity and value to users.
arXiv Detail & Related papers (2020-11-18T15:58:35Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Adaptive Estimator Selection for Off-Policy Evaluation [48.66170976187225]
We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings.
We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant factor.
arXiv Detail & Related papers (2020-02-18T16:57:42Z) - A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics.
Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.