Position: More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research
- URL: http://arxiv.org/abs/2502.00902v1
- Date: Sun, 02 Feb 2025 20:29:09 GMT
- Title: Position: More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research
- Authors: Moritz Wolter, Lokesh Veeramacheneni,
- Abstract summary: Experimental verification and falsification of scholarly work are part of the scientific method's core.<n>To improve the Machine Learning (ML)-communities' ability to verify results from prior work, we argue for more robust software engineering.
- Score: 1.0128808054306186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Experimental verification and falsification of scholarly work are part of the scientific method's core. To improve the Machine Learning (ML)-communities' ability to verify results from prior work, we argue for more robust software engineering. We estimate the adoption of common engineering best practices by examining repository links from all recently accepted International Conference on Machine Learning (ICML), International Conference on Learning Representations (ICLR) and Neural Information Processing Systems (NeurIPS) papers as well as ICML papers over time. Based on the results, we recommend how we, as a community, can improve reproducibility in ML-research.
Related papers
- MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? [66.87201770167012]
MLRC-Bench is a benchmark designed to quantify how effectively language agents can tackle challenging Machine Learning (ML) Research Competitions.<n>Unlike prior work, MLRC-Bench measures the key steps of proposing and implementing novel research methods.<n>Even the best-performing tested agent closes only 9.3% of the gap between baseline and top human participant scores.
arXiv Detail & Related papers (2025-04-13T19:35:43Z) - Perspective of Software Engineering Researchers on Machine Learning Practices Regarding Research, Review, and Education [12.716955305620191]
This study aims to contribute to the knowledge, about the synergy between Machine Learning (ML) and Software Engineering (SE)<n>We analyzed SE researchers familiar with ML or who authored SE articles using ML, along with the articles themselves.<n>We found diverse practices focusing on data collection, model training, and evaluation.
arXiv Detail & Related papers (2024-11-28T18:21:24Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks [2.8061460833143346]
Large Language Models (LLMs) are rapidly becoming ubiquitous both as stand-alone tools and as components of current and future software systems.
To enable usage of LLMs in the high-stake or safety-critical systems of 2030, they need to undergo rigorous testing.
arXiv Detail & Related papers (2024-06-12T13:45:45Z) - Naming the Pain in Machine Learning-Enabled Systems Engineering [8.092979562919878]
Machine learning (ML)-enabled systems are being increasingly adopted by companies.
This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems.
arXiv Detail & Related papers (2024-05-20T06:59:20Z) - Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models [95.96734086126469]
Large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications.
For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work.
We perform a detailed coarse-to-fine analysis of the inference performance of various code libraries.
arXiv Detail & Related papers (2024-04-17T15:57:50Z) - Wildest Dreams: Reproducible Research in Privacy-preserving Neural
Network Training [2.853180143237022]
This work focuses on the ML model's training phase, where maintaining user data privacy is of utmost importance.
We provide a solid theoretical background that eases the understanding of current approaches.
We reproduce results for some of the papers and examine at what level existing works in the field provide support for open science.
arXiv Detail & Related papers (2024-03-06T10:25:36Z) - MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python.
It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z) - Using Machine Learning To Identify Software Weaknesses From Software
Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications.
Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z) - Towards machine learning guided by best practices [0.0]
Machine learning (ML) is being used in software systems with multiple application fields, from medicine to software engineering (SE)
This thesis aims to answer research questions that help to understand the practices used and discussed by practitioners and researchers in the SE community.
arXiv Detail & Related papers (2023-04-29T10:58:37Z) - Machine Learning for Software Engineering: A Tertiary Study [13.832268599253412]
Machine learning (ML) techniques increase the effectiveness of software engineering (SE) lifecycle activities.
We systematically collected, quality-assessed, summarized, and categorized 83 reviews in ML for SE published between 2009-2022, covering 6,117 primary studies.
The SE areas most tackled with ML are software quality and testing, while human-centered areas appear more challenging for ML.
arXiv Detail & Related papers (2022-11-17T09:19:53Z) - Machine Learning State-of-the-Art with Uncertainties [3.4123736336071864]
We conduct an exemplary image classification study in order to demonstrate how confidence intervals around accuracy measurements can greatly enhance the communication of research results.
We make suggestions for improving the authoring and reviewing process of machine learning articles.
arXiv Detail & Related papers (2022-04-11T15:06:26Z) - Machine Learning Application Development: Practitioners' Insights [18.114724750441724]
We report about a survey that aimed to understand the challenges and best practices of ML application development.
We synthesize the results obtained from 80 practitioners into 17 findings; outlining challenges and best practices for ML application development.
We hope that the reported challenges will inform the research community about topics that need to be investigated to improve the engineering process and the quality of ML-based applications.
arXiv Detail & Related papers (2021-12-31T03:38:37Z) - Practical Machine Learning Safety: A Survey and Primer [81.73857913779534]
Open-world deployment of Machine Learning algorithms in safety-critical applications such as autonomous vehicles needs to address a variety of ML vulnerabilities.
New models and training techniques to reduce generalization error, achieve domain adaptation, and detect outlier examples and adversarial attacks.
Our organization maps state-of-the-art ML techniques to safety strategies in order to enhance the dependability of the ML algorithm from different aspects.
arXiv Detail & Related papers (2021-06-09T05:56:42Z) - Understanding the Usability Challenges of Machine Learning In
High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains.
In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions.
We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z) - Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results.
We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z) - Machine Learning for Software Engineering: A Systematic Mapping [73.30245214374027]
The software development industry is rapidly adopting machine learning for transitioning modern day software systems towards highly intelligent and self-learning systems.
No comprehensive study exists that explores the current state-of-the-art on the adoption of machine learning across software engineering life cycle stages.
This study introduces a machine learning for software engineering (MLSE) taxonomy classifying the state-of-the-art machine learning techniques according to their applicability to various software engineering life cycle stages.
arXiv Detail & Related papers (2020-05-27T11:56:56Z) - Improving Reproducibility in Machine Learning Research (A Report from
the NeurIPS 2019 Reproducibility Program) [43.55295847227261]
Reproducibility is obtaining similar results as presented in a paper or talk, using the same code and data (when available)
In 2019, the Neural Information Processing Systems (NeurIPS) conference introduced a program, designed to improve the standards across the community for how we conduct, communicate, and evaluate machine learning research.
In this paper, we describe each of these components, how they were deployed, as well as what we were able to learn from this initiative.
arXiv Detail & Related papers (2020-03-27T02:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.