A Dataset For Computational Reproducibility
- URL: http://arxiv.org/abs/2504.08684v1
- Date: Fri, 11 Apr 2025 16:45:10 GMT
- Title: A Dataset For Computational Reproducibility
- Authors: Lázaro Costa, Susana Barbosa, Jácome Cunha,
- Abstract summary: This article introduces a dataset of computational experiments covering a broad spectrum of scientific fields.<n>It incorporates details about software dependencies, execution steps, and configurations necessary for accurate reproduction.<n>It provides a universal benchmark by establishing a standardized dataset for objectively evaluating and comparing the effectiveness of tools.
- Score: 2.147712260420443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring the reproducibility of scientific work is crucial as it allows the consistent verification of scientific claims and facilitates the advancement of knowledge by providing a reliable foundation for future research. However, scientific work based on computational artifacts, such as scripts for statistical analysis or software prototypes, faces significant challenges in achieving reproducibility. These challenges are based on the variability of computational environments, rapid software evolution, and inadequate documentation of procedures. As a consequence, such artifacts often are not (easily) reproducible, undermining the credibility of scientific findings. The evaluation of reproducibility approaches, in particular of tools, is challenging in many aspects, one being the need to test them with the correct inputs, in this case computational experiments. Thus, this article introduces a curated dataset of computational experiments covering a broad spectrum of scientific fields, incorporating details about software dependencies, execution steps, and configurations necessary for accurate reproduction. The dataset is structured to reflect diverse computational requirements and methodologies, ranging from simple scripts to complex, multi-language workflows, ensuring it presents the wide range of challenges researchers face in reproducing computational studies. It provides a universal benchmark by establishing a standardized dataset for objectively evaluating and comparing the effectiveness of reproducibility tools. Each experiment included in the dataset is carefully documented to ensure ease of use. We added clear instructions following a standard, so each experiment has the same kind of instructions, making it easier for researchers to run each of them with their own reproducibility tool.
Related papers
- Probing the limitations of multimodal language models for chemistry and materials research [3.422786943576035]
We introduce MaCBench, a benchmark for evaluating how vision-language models handle real-world chemistry and materials science tasks.<n>We find that while these systems show promising capabilities in basic perception tasks, they exhibit fundamental limitations in spatial reasoning, cross-modal information synthesis, and logical inference.<n>Our insights have important implications beyond chemistry and materials science, suggesting that developing reliable multimodal AI scientific assistants may require advances in curating suitable training data and approaches to training those models.
arXiv Detail & Related papers (2024-11-25T21:51:45Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields.
We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation.
Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python.
It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z) - Towards Controlled Table-to-Text Generation with Scientific Reasoning [46.87189607486007]
We present a new task for generating fluent and logical descriptions that match user preferences over scientific data, aiming to automate scientific document analysis.
We construct a new challenging dataset,SciTab, consisting of table-description pairs extracted from the scientific literature, with highlighted cells and corresponding domain-specific knowledge base.
The results showed that large models struggle to produce accurate content that aligns with user preferences. As the first of its kind, our work should motivate further research in scientific domains.
arXiv Detail & Related papers (2023-12-08T22:57:35Z) - Managing Software Provenance to Enhance Reproducibility in Computational
Research [1.1421942894219899]
Management of computation-based scientific studies is often left to individual researchers who design their experiments based on personal preferences and the nature of the study.
We believe that the quality, efficiency, and of computation-based scientific research can be improved by explicitly creating an execution environment that allows researchers to provide a clear record of traceability.
arXiv Detail & Related papers (2023-08-29T21:13:18Z) - A Backend Platform for Supporting the Reproducibility of Computational
Experiments [2.1485350418225244]
It is challenging to recreate the same environment using the same frameworks, code, data sources, programming languages, dependencies, and so on.
In this work, we propose an Integrated Development Environment allowing the share, configuration, packaging and execution of an experiment.
We have been able to successfully reproduce 20 (80%) of these experiments achieving the results reported in such works with minimum effort.
arXiv Detail & Related papers (2023-06-29T10:29:11Z) - Pre-training Multi-task Contrastive Learning Models for Scientific
Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks.
We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z) - PyExperimenter: Easily distribute experiments and track results [63.871474825689134]
PyExperimenter is a tool to facilitate the setup, documentation, execution, and subsequent evaluation of results from an empirical study of algorithms.
It is intended to be used by researchers in the field of artificial intelligence, but is not limited to those.
arXiv Detail & Related papers (2023-01-16T10:43:02Z) - Experiments as Code: A Concept for Reproducible, Auditable, Debuggable,
Reusable, & Scalable Experiments [7.557948558412152]
A common concern in experimental research is the auditability and of experiments.
We propose the "Experiments as Code" paradigm, where the whole experiment is not only documented but additionally the automation code is provided.
arXiv Detail & Related papers (2022-02-24T12:15:00Z) - Integrated Benchmarking and Design for Reproducible and Accessible
Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking.
One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible.
We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z) - A user-centered approach to designing an experimental laboratory data
platform [0.0]
We take a user-centered approach to understand what essential elements of design and functionality researchers want in an experimental data platform.
We find that having the capability to contextualize rich, complex experimental datasets is the primary user requirement.
arXiv Detail & Related papers (2020-07-28T19:26:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.