A Guide to Reproducible Research in Signal Processing and Machine
Learning
- URL: http://arxiv.org/abs/2108.12383v1
- Date: Fri, 27 Aug 2021 16:42:32 GMT
- Title: A Guide to Reproducible Research in Signal Processing and Machine
Learning
- Authors: Joseph Shenouda and Waheed U. Bajwa
- Abstract summary: In 2016 a survey conducted by the journal Nature found that 50% of researchers were unable to reproduce their own experiments.
We aim to present signal processing researchers with a set of practical tools and strategies that can help mitigate many of the obstacles to producing reproducible computational experiments.
- Score: 9.69596041242667
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reproducibility is a growing problem that has been extensively studied among
computational researchers and within the signal processing and machine learning
research community. However, with the changing landscape of signal processing
and machine learning research come new obstacles and unseen challenges in
creating reproducible experiments. Due to these new challenges most experiments
have become difficult, if not impossible, to be reproduced by an independent
researcher. In 2016 a survey conducted by the journal Nature found that 50% of
researchers were unable to reproduce their own experiments. While the issue of
reproducibility has been discussed in the literature and specifically within
the signal processing community, it is still unclear to most researchers what
are the best practices to ensure reproducibility without impinging on their
primary responsibility of conducting research. We feel that although
researchers understand the importance of making experiments reproducible, the
lack of a clear set of standards and tools makes it difficult to incorporate
good reproducibility practices in most labs. It is in this regard that we aim
to present signal processing researchers with a set of practical tools and
strategies that can help mitigate many of the obstacles to producing
reproducible computational experiments.
Related papers
- DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents [49.74065769505137]
We introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery.
It includes 120 different challenge tasks spanning eight topics each with three levels of difficulty and several parametric variations.
We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks.
arXiv Detail & Related papers (2024-06-10T20:08:44Z) - Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research [2.3265565167163906]
Empirical research plays a fundamental role in the machine learning domain.
We propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research.
arXiv Detail & Related papers (2024-05-28T11:37:59Z) - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.
BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.
It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - Reproducibility and Geometric Intrinsic Dimensionality: An Investigation on Graph Neural Network Research [0.0]
Building on these efforts we turn towards another critical challenge in machine learning, namely the curse of dimensionality.
Using the closely linked concept of intrinsic dimension we investigate to which the used machine learning models are influenced by the extend dimension of the data sets they are trained on.
arXiv Detail & Related papers (2024-03-13T11:44:30Z) - MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python.
It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets.
GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop.
We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z) - PyExperimenter: Easily distribute experiments and track results [63.871474825689134]
PyExperimenter is a tool to facilitate the setup, documentation, execution, and subsequent evaluation of results from an empirical study of algorithms.
It is intended to be used by researchers in the field of artificial intelligence, but is not limited to those.
arXiv Detail & Related papers (2023-01-16T10:43:02Z) - Sources of Irreproducibility in Machine Learning: A Review [3.905855359082687]
There exist no theoretical framework that relates experiment design choices to potential effects on the conclusions.
The objective of this paper is to develop a framework that enables applied data science practitioners and researchers to understand which experiment design choices can lead to false findings.
arXiv Detail & Related papers (2022-04-15T18:26:03Z) - A user-centered approach to designing an experimental laboratory data
platform [0.0]
We take a user-centered approach to understand what essential elements of design and functionality researchers want in an experimental data platform.
We find that having the capability to contextualize rich, complex experimental datasets is the primary user requirement.
arXiv Detail & Related papers (2020-07-28T19:26:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.