Reproducibility in Machine Learning-Driven Research
- URL: http://arxiv.org/abs/2307.10320v1
- Date: Wed, 19 Jul 2023 07:00:22 GMT
- Title: Reproducibility in Machine Learning-Driven Research
- Authors: Harald Semmelrock and Simone Kopeinik and Dieter Theiler and Tony
Ross-Hellauer and Dominik Kowald
- Abstract summary: Research is facing a viability crisis, in which the results and findings of many studies are difficult or even impossible to reproduce.
This is also the case in machine learning (ML) and artificial intelligence (AI) research.
Although different solutions to address this issue are discussed in the research community such as using ML platforms, the level of in ML-driven research is not increasing substantially.
- Score: 1.7936835766396748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research is facing a reproducibility crisis, in which the results and
findings of many studies are difficult or even impossible to reproduce. This is
also the case in machine learning (ML) and artificial intelligence (AI)
research. Often, this is the case due to unpublished data and/or source-code,
and due to sensitivity to ML training conditions. Although different solutions
to address this issue are discussed in the research community such as using ML
platforms, the level of reproducibility in ML-driven research is not increasing
substantially. Therefore, in this mini survey, we review the literature on
reproducibility in ML-driven research with three main aims: (i) reflect on the
current situation of ML reproducibility in various research fields, (ii)
identify reproducibility issues and barriers that exist in these research
fields applying ML, and (iii) identify potential drivers such as tools,
practices, and interventions that support ML reproducibility. With this, we
hope to contribute to decisions on the viability of different solutions for
supporting ML reproducibility.
Related papers
- Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers [1.4841630983274845]
Research in various fields is currently experiencing challenges regarding awareness of results.
This problem is also prevalent in machine learning (ML) research.
The level of in ML-driven research remains unsatisfactory.
arXiv Detail & Related papers (2024-06-20T13:56:42Z) - MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python.
It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z) - Exploring Perceptual Limitation of Multimodal Large Language Models [57.567868157293994]
We quantitatively study the perception of small visual objects in several state-of-the-art MLLMs.
We identify four independent factors that can contribute to this limitation.
Lower object quality and smaller object size can both independently reduce MLLMs' ability to answer visual questions.
arXiv Detail & Related papers (2024-02-12T03:04:42Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Lost in Translation: Reimagining the Machine Learning Life Cycle in
Education [12.802237736747077]
Machine learning (ML) techniques are increasingly prevalent in education.
There is a pressing need to investigate how ML techniques support long-standing education principles and goals.
In this work, we shed light on this complex landscape drawing on qualitative insights from interviews with education experts.
arXiv Detail & Related papers (2022-09-08T17:14:01Z) - REAL ML: Recognizing, Exploring, and Articulating Limitations of Machine
Learning Research [19.71032778307425]
Transparency around limitations can improve the scientific rigor of research, help ensure appropriate interpretation of research findings, and make research claims more credible.
Despite these benefits, the machine learning (ML) research community lacks well-developed norms around disclosing and discussing limitations.
We conduct an iterative design process with 30 ML and ML-adjacent researchers to develop REAL ML, a set of guided activities to help ML researchers recognize, explore, and articulate the limitations of their research.
arXiv Detail & Related papers (2022-05-05T15:32:45Z) - The challenge of reproducible ML: an empirical study on the impact of
bugs [6.862925771672299]
In this paper, we establish the fundamental factors that cause non-determinism in Machine Learning systems.
A framework, ReproduceML, is then introduced for deterministic evaluation of ML experiments in a real, controlled environment.
This study attempts to quantify the impact that the occurrence of bugs in a popular ML framework, PyTorch, has on the performance of trained models.
arXiv Detail & Related papers (2021-09-09T01:36:39Z) - Understanding the Usability Challenges of Machine Learning In
High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains.
In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions.
We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z) - Machine Learning Towards Intelligent Systems: Applications, Challenges,
and Opportunities [8.68311678910946]
Machine learning (ML) provides a mechanism for humans to process large amounts of data.
This review focuses on some of the fields and applications such as education, healthcare, network security, banking and finance, and social media.
arXiv Detail & Related papers (2021-01-11T01:32:15Z) - Machine Learning in Nano-Scale Biomedical Engineering [77.75587007080894]
We review the existing research regarding the use of machine learning in nano-scale biomedical engineering.
The main challenges that can be formulated as ML problems are classified into the three main categories.
For each of the presented methodologies, special emphasis is given to its principles, applications, and limitations.
arXiv Detail & Related papers (2020-08-05T15:45:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.