Position: More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research
- URL: http://arxiv.org/abs/2502.00902v1
- Date: Sun, 02 Feb 2025 20:29:09 GMT
- Title: Position: More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research
- Authors: Moritz Wolter, Lokesh Veeramacheneni,
- Abstract summary: Experimental verification and falsification of scholarly work are part of the scientific method's core.
To improve the Machine Learning (ML)-communities' ability to verify results from prior work, we argue for more robust software engineering.
- Score: 1.0128808054306186
- License:
- Abstract: Experimental verification and falsification of scholarly work are part of the scientific method's core. To improve the Machine Learning (ML)-communities' ability to verify results from prior work, we argue for more robust software engineering. We estimate the adoption of common engineering best practices by examining repository links from all recently accepted International Conference on Machine Learning (ICML), International Conference on Learning Representations (ICLR) and Neural Information Processing Systems (NeurIPS) papers as well as ICML papers over time. Based on the results, we recommend how we, as a community, can improve reproducibility in ML-research.
Related papers
- Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks [2.8061460833143346]
Large Language Models (LLMs) are rapidly becoming ubiquitous both as stand-alone tools and as components of current and future software systems.
To enable usage of LLMs in the high-stake or safety-critical systems of 2030, they need to undergo rigorous testing.
arXiv Detail & Related papers (2024-06-12T13:45:45Z) - Naming the Pain in Machine Learning-Enabled Systems Engineering [8.092979562919878]
Machine learning (ML)-enabled systems are being increasingly adopted by companies.
This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems.
arXiv Detail & Related papers (2024-05-20T06:59:20Z) - Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models [95.96734086126469]
Large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications.
For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work.
We perform a detailed coarse-to-fine analysis of the inference performance of various code libraries.
arXiv Detail & Related papers (2024-04-17T15:57:50Z) - Machine Learning State-of-the-Art with Uncertainties [3.4123736336071864]
We conduct an exemplary image classification study in order to demonstrate how confidence intervals around accuracy measurements can greatly enhance the communication of research results.
We make suggestions for improving the authoring and reviewing process of machine learning articles.
arXiv Detail & Related papers (2022-04-11T15:06:26Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z) - Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results.
We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z) - Machine Learning for Software Engineering: A Systematic Mapping [73.30245214374027]
The software development industry is rapidly adopting machine learning for transitioning modern day software systems towards highly intelligent and self-learning systems.
No comprehensive study exists that explores the current state-of-the-art on the adoption of machine learning across software engineering life cycle stages.
This study introduces a machine learning for software engineering (MLSE) taxonomy classifying the state-of-the-art machine learning techniques according to their applicability to various software engineering life cycle stages.
arXiv Detail & Related papers (2020-05-27T11:56:56Z) - Improving Reproducibility in Machine Learning Research (A Report from
the NeurIPS 2019 Reproducibility Program) [43.55295847227261]
Reproducibility is obtaining similar results as presented in a paper or talk, using the same code and data (when available)
In 2019, the Neural Information Processing Systems (NeurIPS) conference introduced a program, designed to improve the standards across the community for how we conduct, communicate, and evaluate machine learning research.
In this paper, we describe each of these components, how they were deployed, as well as what we were able to learn from this initiative.
arXiv Detail & Related papers (2020-03-27T02:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.