Experiments as Code: A Concept for Reproducible, Auditable, Debuggable,
Reusable, & Scalable Experiments
- URL: http://arxiv.org/abs/2202.12050v1
- Date: Thu, 24 Feb 2022 12:15:00 GMT
- Title: Experiments as Code: A Concept for Reproducible, Auditable, Debuggable,
Reusable, & Scalable Experiments
- Authors: Leonel Aguilar, Michal Gath-Morad, Jascha Gr\"ubel, Jasper Ermatinger,
Hantao Zhao, Stefan Wehrli, Robert W. Sumner, Ce Zhang, Dirk Helbing,
Christoph H\"olscher
- Abstract summary: A common concern in experimental research is the auditability and of experiments.
We propose the "Experiments as Code" paradigm, where the whole experiment is not only documented but additionally the automation code is provided.
- Score: 7.557948558412152
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A common concern in experimental research is the auditability and
reproducibility of experiments. Experiments are usually designed, provisioned,
managed, and analyzed by diverse teams of specialists (e.g., researchers,
technicians and engineers) and may require many resources (e.g. cloud
infrastructure, specialized equipment). Even though researchers strive to
document experiments accurately, this process is often lacking, making it hard
to reproduce them. Moreover, when it is necessary to create a similar
experiment, very often we end up "reinventing the wheel" as it is easier to
start from scratch than trying to reuse existing work, thus losing valuable
embedded best practices and previous experiences. In behavioral studies this
has contributed to the reproducibility crisis. To tackle this challenge, we
propose the "Experiments as Code" paradigm, where the whole experiment is not
only documented but additionally the automation code to provision, deploy,
manage, and analyze it is provided. To this end we define the Experiments as
Code concept, provide a taxonomy for the components of a practical
implementation, and provide a proof of concept with a simple desktop VR
experiment that showcases the benefits of its "as code" representation, i.e.,
reproducibility, auditability, debuggability, reusability, and scalability.
Related papers
- AExGym: Benchmarks and Environments for Adaptive Experimentation [7.948144726705323]
We present a benchmark for adaptive experimentation based on real-world datasets.
We highlight prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity.
arXiv Detail & Related papers (2024-08-08T15:32:12Z) - Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem.
Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z) - DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents [49.74065769505137]
We introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery.
It includes 120 different challenge tasks spanning eight topics each with three levels of difficulty and several parametric variations.
We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks.
arXiv Detail & Related papers (2024-06-10T20:08:44Z) - MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python.
It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z) - Content and structure of laboratory packages for software engineering
experiments [1.3584003182788122]
This paper investigates the experiment replication process to find out what information is needed to successfully replicate an experiment.
Our objective is to propose the content and structure of laboratory packages for software engineering experiments.
arXiv Detail & Related papers (2024-02-11T14:29:15Z) - ExPT: Synthetic Pretraining for Few-Shot Experimental Design [33.5918976228562]
Experiment Pretrained Transformers (ExPT) is a foundation model for few-shot experimental design.
ExPT employs a novel combination of synthetic pretraining with in-context learning.
We evaluate ExPT on few-shot experimental design in challenging domains.
arXiv Detail & Related papers (2023-10-30T19:25:43Z) - A Backend Platform for Supporting the Reproducibility of Computational
Experiments [2.1485350418225244]
It is challenging to recreate the same environment using the same frameworks, code, data sources, programming languages, dependencies, and so on.
In this work, we propose an Integrated Development Environment allowing the share, configuration, packaging and execution of an experiment.
We have been able to successfully reproduce 20 (80%) of these experiments achieving the results reported in such works with minimum effort.
arXiv Detail & Related papers (2023-06-29T10:29:11Z) - GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets.
GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop.
We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z) - Benchopt: Reproducible, efficient and collaborative optimization
benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning.
Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z) - dagger: A Python Framework for Reproducible Machine Learning Experiment
Orchestration [0.913755431537592]
Multi-stage experiments in machine learning often involve state-mutating operations acting on models along multiple paths of execution.
We present dagger, a framework to facilitate reproducible and reusable experiment orchestration.
arXiv Detail & Related papers (2020-06-12T21:42:48Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.