Related papers: An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software

An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software

URL: http://arxiv.org/abs/2311.12019v2
Date: Sun, 9 Jun 2024 14:11:26 GMT
Title: An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software
Authors: Aaditya Bhatia, Foutse Khomh, Bram Adams, Ahmed E Hassan,
Abstract summary: Self-admitted technical debt (SATD) can have a significant impact on the quality of machine learning-based software. This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects.
Score: 17.999512016809945
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The emergence of open-source ML libraries such as TensorFlow and Google Auto ML has enabled developers to harness state-of-the-art ML algorithms with minimal overhead. However, during this accelerated ML development process, said developers may often make sub-optimal design and implementation decisions, leading to the introduction of technical debt that, if not addressed promptly, can have a significant impact on the quality of the ML-based software. Developers frequently acknowledge these sub-optimal design and development choices through code comments during software development. These comments, which often highlight areas requiring additional work or refinement in the future, are known as self-admitted technical debt (SATD). This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects. We detected SATD in source code comments throughout the different project snapshots, conducted a manual analysis of the identified SATD sample to comprehend the nature of technical debt in the ML code, and performed a survival analysis of the SATD to understand the evolution of such debts. We observed: i) Machine learning projects have a median percentage of SATD that is twice the median percentage of SATD in non-machine learning projects. ii) ML pipeline components for data preprocessing and model generation logic are more susceptible to debt than model validation and deployment components. iii) SATDs appear in ML projects earlier in the development process compared to non-ML projects. iv) Long-lasting SATDs are typically introduced during extensive code changes that span multiple files exhibiting low complexity.

Related papers

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions. We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification (UQ) is a critical component of machine learning (ML) applications. We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines. We conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks, and identify the most promising approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models. We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization. Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z)
How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions [1.5197353881052764]
We conduct a comprehensive analysis of 185 open-source projects on GitHub (93 ML and 92 non-ML projects) Our investigation comprises both quantitative and qualitative dimensions, aiming to uncover differences in CI adoption between ML and non-ML projects. Our findings indicate that ML projects often require longer build durations, and medium-sized ML projects exhibit lower test coverage compared to non-ML projects.
arXiv Detail & Related papers (2024-03-14T16:35:39Z)
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study [72.24266814625685]
We explore the performance of large language models (LLMs) across the entire software development lifecycle with DevEval. DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task. Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval.
arXiv Detail & Related papers (2024-03-13T15:13:44Z)
Investigating the Impact of SOLID Design Principles on Machine Learning Code Understanding [2.5788518098820337]
We investigated the impact of the SOLID design principles on Machine Learning code understanding. We restructured real industrial ML code that did not use SOLID principles. Results provided statistically significant evidence that the adoption of the SOLID design principles can improve code understanding.
arXiv Detail & Related papers (2024-02-08T00:44:45Z)
Self-Admitted Technical Debt Detection Approaches: A Decade Systematic Review [5.670597842524448]
Technical debt (TD) represents the long-term costs associated with suboptimal design or code decisions in software development. Self-Admitted Technical Debt (SATD) occurs when developers explicitly acknowledge these trade-offs. automated detection of SATD has become an increasingly important research area.
arXiv Detail & Related papers (2023-12-19T12:01:13Z)
SEED-Bench-2: Benchmarking Multimodal Large Language Models [67.28089415198338]
Multimodal large language models (MLLMs) have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs. SEED-Bench-2 comprises 24K multiple-choice questions with accurate human annotations, which spans 27 dimensions. We evaluate the performance of 23 prominent open-source MLLMs and summarize valuable observations.
arXiv Detail & Related papers (2023-11-28T05:53:55Z)
How Effective are Large Language Models in Generating Software Specifications? [14.170320751508502]
Large Language Models (LLMs) have been successfully applied to numerous Software Engineering (SE) tasks. We conduct the first empirical study to evaluate the capabilities of LLMs for generating software specifications from software comments or documentation.
arXiv Detail & Related papers (2023-06-06T00:28:39Z)
"Project smells" -- Experiences in Analysing the Software Quality of ML Projects with mllint [6.0141405230309335]
We introduce the novel concept of project smells which consider deficits in project management as a more holistic perspective on software quality. An open-source static analysis tool mllint was also implemented to help detect and mitigate these. Our findings indicate a need for context-aware static analysis tools, that fit the needs of the project at its current stage of development.
arXiv Detail & Related papers (2022-01-20T15:52:24Z)
Characterizing and Detecting Mismatch in Machine-Learning-Enabled Systems [1.4695979686066065]
Development and deployment of machine learning systems remains a challenge. In this paper, we report our findings and their implications for improving end-to-end ML-enabled system development.
arXiv Detail & Related papers (2021-03-25T19:40:29Z)
Understanding the Usability Challenges of Machine Learning In High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains. In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions. We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z)
Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results. We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.