Sources of Irreproducibility in Machine Learning: A Review
- URL: http://arxiv.org/abs/2204.07610v2
- Date: Fri, 14 Apr 2023 17:31:27 GMT
- Title: Sources of Irreproducibility in Machine Learning: A Review
- Authors: Odd Erik Gundersen, Kevin Coakley, Christine Kirkpatrick and Yolanda
Gil
- Abstract summary: There exist no theoretical framework that relates experiment design choices to potential effects on the conclusions.
The objective of this paper is to develop a framework that enables applied data science practitioners and researchers to understand which experiment design choices can lead to false findings.
- Score: 3.905855359082687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background: Many published machine learning studies are irreproducible.
Issues with methodology and not properly accounting for variation introduced by
the algorithm themselves or their implementations are attributed as the main
contributors to the irreproducibility.Problem: There exist no theoretical
framework that relates experiment design choices to potential effects on the
conclusions. Without such a framework, it is much harder for practitioners and
researchers to evaluate experiment results and describe the limitations of
experiments. The lack of such a framework also makes it harder for independent
researchers to systematically attribute the causes of failed reproducibility
experiments. Objective: The objective of this paper is to develop a framework
that enable applied data science practitioners and researchers to understand
which experiment design choices can lead to false findings and how and by this
help in analyzing the conclusions of reproducibility experiments. Method: We
have compiled an extensive list of factors reported in the literature that can
lead to machine learning studies being irreproducible. These factors are
organized and categorized in a reproducibility framework motivated by the
stages of the scientific method. The factors are analyzed for how they can
affect the conclusions drawn from experiments. A model comparison study is used
as an example. Conclusion: We provide a framework that describes machine
learning methodology from experimental design decisions to the conclusions
inferred from them.
Related papers
- Hypothesizing Missing Causal Variables with LLMs [55.28678224020973]
We formulate a novel task where the input is a partial causal graph with missing variables, and the output is a hypothesis about the missing variables to complete the partial graph.
We show the strong ability of LLMs to hypothesize the mediation variables between a cause and its effect.
We also observe surprising results where some of the open-source models outperform the closed GPT-4 model.
arXiv Detail & Related papers (2024-09-04T10:37:44Z) - Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research [2.3265565167163906]
Empirical research plays a fundamental role in the machine learning domain.
We propose a model for the empirical research process, accompanied by guidelines to uphold the validity of empirical research.
arXiv Detail & Related papers (2024-05-28T11:37:59Z) - Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - Examining the Effect of Implementation Factors on Deep Learning
Reproducibility [1.4295431367554867]
Three deep learning experiments were ran five times each on 13 different hardware environments and four different software environments.
There was a greater than 6% accuracy range on the same deterministic examples introduced from hardware or software environment variations alone.
arXiv Detail & Related papers (2023-12-11T18:51:13Z) - A Double Machine Learning Approach to Combining Experimental and Observational Data [59.29868677652324]
We propose a double machine learning approach to combine experimental and observational studies.
Our framework tests for violations of external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z) - A Causal Framework for Decomposing Spurious Variations [68.12191782657437]
We develop tools for decomposing spurious variations in Markovian and Semi-Markovian models.
We prove the first results that allow a non-parametric decomposition of spurious effects.
The described approach has several applications, ranging from explainable and fair AI to questions in epidemiology and medicine.
arXiv Detail & Related papers (2023-06-08T09:40:28Z) - Testing Causality in Scientific Modelling Software [0.26388783516590225]
Causal Testing Framework is a framework that uses Causal Inference techniques to establish causal effects from existing data.
We present three case studies covering real-world scientific models, demonstrating how the Causal Testing Framework can infer metamorphic test outcomes.
arXiv Detail & Related papers (2022-09-01T10:57:54Z) - Observing Interventions: A logic for thinking about experiments [62.997667081978825]
This paper makes a first step towards a logic of learning from experiments.
Crucial for our approach is the idea that the notion of an intervention can be used as a formal expression of a (real or hypothetical) experiment.
For all the proposed logical systems, we provide a sound and complete axiomatization.
arXiv Detail & Related papers (2021-11-25T09:26:45Z) - A Guide to Reproducible Research in Signal Processing and Machine
Learning [9.69596041242667]
In 2016 a survey conducted by the journal Nature found that 50% of researchers were unable to reproduce their own experiments.
We aim to present signal processing researchers with a set of practical tools and strategies that can help mitigate many of the obstacles to producing reproducible computational experiments.
arXiv Detail & Related papers (2021-08-27T16:42:32Z) - Optimal Learning for Sequential Decisions in Laboratory Experimentation [0.0]
This tutorial is aimed to provide experimental scientists with a foundation in the science of making decisions.
We introduce the concept of a learning policy, and review the major categories of policies.
We then introduce a policy, known as the knowledge gradient, that maximizes the value of information from each experiment.
arXiv Detail & Related papers (2020-04-11T14:53:29Z) - A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics.
Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.