Examining the Effect of Implementation Factors on Deep Learning
Reproducibility
- URL: http://arxiv.org/abs/2312.06633v1
- Date: Mon, 11 Dec 2023 18:51:13 GMT
- Title: Examining the Effect of Implementation Factors on Deep Learning
Reproducibility
- Authors: Kevin Coakley, Christine R. Kirkpatrick, Odd Erik Gundersen
- Abstract summary: Three deep learning experiments were ran five times each on 13 different hardware environments and four different software environments.
There was a greater than 6% accuracy range on the same deterministic examples introduced from hardware or software environment variations alone.
- Score: 1.4295431367554867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reproducing published deep learning papers to validate their conclusions can
be difficult due to sources of irreproducibility. We investigate the impact
that implementation factors have on the results and how they affect
reproducibility of deep learning studies. Three deep learning experiments were
ran five times each on 13 different hardware environments and four different
software environments. The analysis of the 780 combined results showed that
there was a greater than 6% accuracy range on the same deterministic examples
introduced from hardware or software environment variations alone. To account
for these implementation factors, researchers should run their experiments
multiple times in different hardware and software environments to verify their
conclusions are not affected.
Related papers
- Contexts Matter: An Empirical Study on Contextual Influence in Fairness Testing for Deep Learning Systems [3.077531983369872]
We aim to understand how varying contexts affect fairness testing outcomes.
Our results show that different context types and settings generally lead to a significant impact on the testing.
arXiv Detail & Related papers (2024-08-12T12:36:06Z) - Assessing effect sizes, variability, and power in the on-line study of language production [0.0]
We compare response time data obtained in the same word production experiment conducted in the lab and on-line.
We determine whether the two settings differ in effect sizes, in the consistency of responses over the course of the experiment.
We assess the impact of these differences on the power of the design in a series of simulations.
arXiv Detail & Related papers (2024-03-19T11:49:03Z) - Hazards in Deep Learning Testing: Prevalence, Impact and Recommendations [17.824339932321788]
We identify 10 commonly adopted empirical evaluation hazards that may significantly impact experimental results.
Our findings indicate that all 10 hazards have the potential to invalidate experimental findings.
We propose a point set of 10 good empirical practices that has the potential to mitigate the impact of the hazards.
arXiv Detail & Related papers (2023-09-11T11:05:34Z) - A Causal Framework for Decomposing Spurious Variations [68.12191782657437]
We develop tools for decomposing spurious variations in Markovian and Semi-Markovian models.
We prove the first results that allow a non-parametric decomposition of spurious effects.
The described approach has several applications, ranging from explainable and fair AI to questions in epidemiology and medicine.
arXiv Detail & Related papers (2023-06-08T09:40:28Z) - Pitfalls in Experiments with DNN4SE: An Analysis of the State of the
Practice [0.7614628596146599]
We conduct a mapping study, examining 194 experiments with techniques that rely on deep neural networks appearing in 55 papers published in premier software engineering venues.
Our study reveals that most of the experiments, including those that have received ACM artifact badges, have fundamental limitations that raise doubts about the reliability of their findings.
arXiv Detail & Related papers (2023-05-19T09:55:48Z) - PyExperimenter: Easily distribute experiments and track results [63.871474825689134]
PyExperimenter is a tool to facilitate the setup, documentation, execution, and subsequent evaluation of results from an empirical study of algorithms.
It is intended to be used by researchers in the field of artificial intelligence, but is not limited to those.
arXiv Detail & Related papers (2023-01-16T10:43:02Z) - Towards a Fair Comparison and Realistic Design and Evaluation Framework
of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework.
We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models.
We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z) - Sources of Irreproducibility in Machine Learning: A Review [3.905855359082687]
There exist no theoretical framework that relates experiment design choices to potential effects on the conclusions.
The objective of this paper is to develop a framework that enables applied data science practitioners and researchers to understand which experiment design choices can lead to false findings.
arXiv Detail & Related papers (2022-04-15T18:26:03Z) - On Inductive Biases for Heterogeneous Treatment Effect Estimation [91.3755431537592]
We investigate how to exploit structural similarities of an individual's potential outcomes (POs) under different treatments.
We compare three end-to-end learning strategies to overcome this problem.
arXiv Detail & Related papers (2021-06-07T16:30:46Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z) - Sentiment Analysis Based on Deep Learning: A Comparative Study [69.09570726777817]
The study of public opinion can provide us with valuable information.
The efficiency and accuracy of sentiment analysis is being hindered by the challenges encountered in natural language processing.
This paper reviews the latest studies that have employed deep learning to solve sentiment analysis problems.
arXiv Detail & Related papers (2020-06-05T16:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.