Related papers: Experiments as Code: A Concept for Reproducible, Auditable, Debuggable, Reusable, & Scalable Experiments

Experiments as Code: A Concept for Reproducible, Auditable, Debuggable, Reusable, & Scalable Experiments

URL: http://arxiv.org/abs/2202.12050v1
Date: Thu, 24 Feb 2022 12:15:00 GMT
Title: Experiments as Code: A Concept for Reproducible, Auditable, Debuggable, Reusable, & Scalable Experiments
Authors: Leonel Aguilar, Michal Gath-Morad, Jascha Gr\"ubel, Jasper Ermatinger, Hantao Zhao, Stefan Wehrli, Robert W. Sumner, Ce Zhang, Dirk Helbing, Christoph H\"olscher
Abstract summary: A common concern in experimental research is the auditability and of experiments. We propose the "Experiments as Code" paradigm, where the whole experiment is not only documented but additionally the automation code is provided.
Score: 7.557948558412152
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A common concern in experimental research is the auditability and reproducibility of experiments. Experiments are usually designed, provisioned, managed, and analyzed by diverse teams of specialists (e.g., researchers, technicians and engineers) and may require many resources (e.g. cloud infrastructure, specialized equipment). Even though researchers strive to document experiments accurately, this process is often lacking, making it hard to reproduce them. Moreover, when it is necessary to create a similar experiment, very often we end up "reinventing the wheel" as it is easier to start from scratch than trying to reuse existing work, thus losing valuable embedded best practices and previous experiences. In behavioral studies this has contributed to the reproducibility crisis. To tackle this challenge, we propose the "Experiments as Code" paradigm, where the whole experiment is not only documented but additionally the automation code to provision, deploy, manage, and analyze it is provided. To this end we define the Experiments as Code concept, provide a taxonomy for the components of a practical implementation, and provide a proof of concept with a simple desktop VR experiment that showcases the benefits of its "as code" representation, i.e., reproducibility, auditability, debuggability, reusability, and scalability.

Related papers

A Dataset For Computational Reproducibility [2.147712260420443]
This article introduces a dataset of computational experiments covering a broad spectrum of scientific fields. It incorporates details about software dependencies, execution steps, and configurations necessary for accurate reproduction. It provides a universal benchmark by establishing a standardized dataset for objectively evaluating and comparing the effectiveness of tools.
arXiv Detail & Related papers (2025-04-11T16:45:10Z)
CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation [48.12054700748627]
We introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly. We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments.
arXiv Detail & Related papers (2025-03-20T22:37:17Z)
A Framework for Supporting the Reproducibility of Computational Experiments in Multiple Scientific Domains [2.147712260420443]
In recent years, the research community, but also the general public, has raised serious questions about the replicability of scientific work. We propose a framework, known as SciRep, that supports the configuration, execution, and packaging of computational experiments. Our approach allows the creation of a package for experiments from multiple scientific fields, which can be re-executed on any computer.
arXiv Detail & Related papers (2025-03-10T09:02:01Z)
AExGym: Benchmarks and Environments for Adaptive Experimentation [7.948144726705323]
We present a benchmark for adaptive experimentation based on real-world datasets. We highlight prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity.
arXiv Detail & Related papers (2024-08-08T15:32:12Z)
Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem. Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z)
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents [49.74065769505137]
We introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery. It includes 120 different challenge tasks spanning eight topics each with three levels of difficulty and several parametric variations. We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks.
arXiv Detail & Related papers (2024-06-10T20:08:44Z)
MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python. It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z)
Content and structure of laboratory packages for software engineering experiments [1.3584003182788122]
This paper investigates the experiment replication process to find out what information is needed to successfully replicate an experiment. Our objective is to propose the content and structure of laboratory packages for software engineering experiments.
arXiv Detail & Related papers (2024-02-11T14:29:15Z)
ExPT: Synthetic Pretraining for Few-Shot Experimental Design [33.5918976228562]
Experiment Pretrained Transformers (ExPT) is a foundation model for few-shot experimental design. ExPT employs a novel combination of synthetic pretraining with in-context learning. We evaluate ExPT on few-shot experimental design in challenging domains.
arXiv Detail & Related papers (2023-10-30T19:25:43Z)
A Backend Platform for Supporting the Reproducibility of Computational Experiments [2.1485350418225244]
It is challenging to recreate the same environment using the same frameworks, code, data sources, programming languages, dependencies, and so on. In this work, we propose an Integrated Development Environment allowing the share, configuration, packaging and execution of an experiment. We have been able to successfully reproduce 20 (80%) of these experiments achieving the results reported in such works with minimum effort.
arXiv Detail & Related papers (2023-06-29T10:29:11Z)
GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets. GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop. We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z)
Benchopt: Reproducible, efficient and collaborative optimization benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z)
dagger: A Python Framework for Reproducible Machine Learning Experiment Orchestration [0.913755431537592]
Multi-stage experiments in machine learning often involve state-mutating operations acting on models along multiple paths of execution. We present dagger, a framework to facilitate reproducible and reusable experiment orchestration.
arXiv Detail & Related papers (2020-06-12T21:42:48Z)
Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.