Related papers: A Guide to Reproducible Research in Signal Processing and Machine Learning

A Guide to Reproducible Research in Signal Processing and Machine Learning

URL: http://arxiv.org/abs/2108.12383v1
Date: Fri, 27 Aug 2021 16:42:32 GMT
Title: A Guide to Reproducible Research in Signal Processing and Machine Learning
Authors: Joseph Shenouda and Waheed U. Bajwa
Abstract summary: In 2016 a survey conducted by the journal Nature found that 50% of researchers were unable to reproduce their own experiments. We aim to present signal processing researchers with a set of practical tools and strategies that can help mitigate many of the obstacles to producing reproducible computational experiments.
Score: 9.69596041242667
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reproducibility is a growing problem that has been extensively studied among computational researchers and within the signal processing and machine learning research community. However, with the changing landscape of signal processing and machine learning research come new obstacles and unseen challenges in creating reproducible experiments. Due to these new challenges most experiments have become difficult, if not impossible, to be reproduced by an independent researcher. In 2016 a survey conducted by the journal Nature found that 50% of researchers were unable to reproduce their own experiments. While the issue of reproducibility has been discussed in the literature and specifically within the signal processing community, it is still unclear to most researchers what are the best practices to ensure reproducibility without impinging on their primary responsibility of conducting research. We feel that although researchers understand the importance of making experiments reproducible, the lack of a clear set of standards and tools makes it difficult to incorporate good reproducibility practices in most labs. It is in this regard that we aim to present signal processing researchers with a set of practical tools and strategies that can help mitigate many of the obstacles to producing reproducible computational experiments.

Related papers

A Dataset For Computational Reproducibility [2.147712260420443]
This article introduces a dataset of computational experiments covering a broad spectrum of scientific fields. It incorporates details about software dependencies, execution steps, and configurations necessary for accurate reproduction. It provides a universal benchmark by establishing a standardized dataset for objectively evaluating and comparing the effectiveness of tools.
arXiv Detail & Related papers (2025-04-11T16:45:10Z)
Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers [1.4841630983274845]
Lack of transparency, data or code, poor adherence to standards, and sensitivity of ML training mean that many papers are not even reproducible in principle. Experiments have found worryingly low degrees of similarity with original results. Poor integrity threatens trust in and integrity of research results.
arXiv Detail & Related papers (2024-06-20T13:56:42Z)
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents [49.74065769505137]
We introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery. It includes 120 different challenge tasks spanning eight topics each with three levels of difficulty and several parametric variations. We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks.
arXiv Detail & Related papers (2024-06-10T20:08:44Z)
Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research [2.3265565167163906]
Empirical research plays a fundamental role in the machine learning domain. We propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research.
arXiv Detail & Related papers (2024-05-28T11:37:59Z)
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions. BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model. It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z)
Reproducibility and Geometric Intrinsic Dimensionality: An Investigation on Graph Neural Network Research [0.0]
Building on these efforts we turn towards another critical challenge in machine learning, namely the curse of dimensionality. Using the closely linked concept of intrinsic dimension we investigate to which the used machine learning models are influenced by the extend dimension of the data sets they are trained on.
arXiv Detail & Related papers (2024-03-13T11:44:30Z)
MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python. It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z)
Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges. We first present the model that underlies most of current causal approaches to single-cell biology. We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z)
GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets. GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop. We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z)
PyExperimenter: Easily distribute experiments and track results [63.871474825689134]
PyExperimenter is a tool to facilitate the setup, documentation, execution, and subsequent evaluation of results from an empirical study of algorithms. It is intended to be used by researchers in the field of artificial intelligence, but is not limited to those.
arXiv Detail & Related papers (2023-01-16T10:43:02Z)
Sources of Irreproducibility in Machine Learning: A Review [3.905855359082687]
There exist no theoretical framework that relates experiment design choices to potential effects on the conclusions. The objective of this paper is to develop a framework that enables applied data science practitioners and researchers to understand which experiment design choices can lead to false findings.
arXiv Detail & Related papers (2022-04-15T18:26:03Z)
A user-centered approach to designing an experimental laboratory data platform [0.0]
We take a user-centered approach to understand what essential elements of design and functionality researchers want in an experimental data platform. We find that having the capability to contextualize rich, complex experimental datasets is the primary user requirement.
arXiv Detail & Related papers (2020-07-28T19:26:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.