Comparative analysis of real bugs in open-source Machine Learning
projects -- A Registered Report
- URL: http://arxiv.org/abs/2209.09932v1
- Date: Tue, 20 Sep 2022 18:12:12 GMT
- Title: Comparative analysis of real bugs in open-source Machine Learning
projects -- A Registered Report
- Authors: Tuan Dung Lai, Anj Simmons, Scott Barnett, Jean-Guy Schneider, Rajesh
Vasa
- Abstract summary: We investigate whether there is a discrepancy in the distribution of resolution time between Machine Learning and non-ML issues.
We measure the resolution time and size of fix of ML and non-ML issues on a controlled sample and compare the distributions for each category of issue.
- Score: 5.275804627373337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: Machine Learning (ML) systems rely on data to make predictions,
the systems have many added components compared to traditional software systems
such as the data processing pipeline, serving pipeline, and model training.
Existing research on software maintenance has studied the issue-reporting needs
and resolution process for different types of issues, such as performance and
security issues. However, ML systems have specific classes of faults, and
reporting ML issues requires domain-specific information. Because of the
different characteristics between ML and traditional Software Engineering
systems, we do not know to what extent the reporting needs are different, and
to what extent these differences impact the issue resolution process.
Objective: Our objective is to investigate whether there is a discrepancy in
the distribution of resolution time between ML and non-ML issues and whether
certain categories of ML issues require a longer time to resolve based on real
issue reports in open-source applied ML projects. We further investigate the
size of fix of ML issues and non-ML issues. Method: We extract issues reports,
pull requests and code files in recent active applied ML projects from Github,
and use an automatic approach to filter ML and non-ML issues. We manually label
the issues using a known taxonomy of deep learning bugs. We measure the
resolution time and size of fix of ML and non-ML issues on a controlled sample
and compare the distributions for each category of issue.
Related papers
- Verbalized Machine Learning: Revisiting Machine Learning with Language Models [63.10391314749408]
We introduce the framework of verbalized machine learning (VML)
VML constrains the parameter space to be human-interpretable natural language.
We conduct several studies to empirically evaluate the effectiveness of VML.
arXiv Detail & Related papers (2024-06-06T17:59:56Z) - Understanding Information Storage and Transfer in Multi-modal Large Language Models [51.20840103605018]
We study how Multi-modal Large Language Models process information in a factual visual question answering task.
Key findings show that these MLLMs rely on self-attention blocks in much earlier layers for information storage.
We introduce MultEdit, a model-editing algorithm that can correct errors and insert new long-tailed information into MLLMs.
arXiv Detail & Related papers (2024-06-06T16:35:36Z) - Efficient Multimodal Large Language Models: A Survey [60.7614299984182]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning.
The extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry.
This survey provides a comprehensive and systematic review of the current state of efficient MLLMs.
arXiv Detail & Related papers (2024-05-17T12:37:10Z) - When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in
ML-enabled Systems [13.718420553401662]
We aim to investigate the emergence and evolution of specific types of quality-related concerns known as ML-specific code smells.
More specifically, we present a plan to study ML-specific code smells by empirically analyzing their prevalence in real ML-enabled systems.
We will conduct an exploratory study, mining a large dataset of ML-enabled systems and analyzing over 400k commits about 337 projects.
arXiv Detail & Related papers (2024-03-13T07:43:45Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Bug Characterization in Machine Learning-based Systems [15.521925194920893]
We investigate the characteristics of bugs in Machine Learning-based software systems and the difference between ML and non-ML bugs from the maintenance viewpoint.
Our analysis shows that nearly half of the real issues reported in ML-based systems are ML bugs, indicating that ML components are more error-prone than non-ML components.
arXiv Detail & Related papers (2023-07-26T21:21:02Z) - Bugs in Machine Learning-based Systems: A Faultload Benchmark [16.956588187947993]
There is no standard benchmark of bugs to assess their performance, compare them and discuss their advantages and weaknesses.
In this study, we firstly investigate the verifiability of the bugs in ML-based systems and show the most important factors in each one.
We provide a benchmark namely defect4ML that satisfies all criteria of standard benchmark, i.e. relevance, fairness, verifiability, and usability.
arXiv Detail & Related papers (2022-06-24T14:20:34Z) - Towards Perspective-Based Specification of Machine Learning-Enabled
Systems [1.3406258114080236]
This paper describes our work towards a perspective-based approach for specifying ML-enabled systems.
The approach involves analyzing a set of 45 ML concerns grouped into five perspectives: objectives, user experience, infrastructure, model, and data.
The main contribution of this paper is to provide two new artifacts that can be used to help specifying ML-enabled systems.
arXiv Detail & Related papers (2022-06-20T13:09:23Z) - Understanding the Usability Challenges of Machine Learning In
High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains.
In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions.
We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z) - Vamsa: Automated Provenance Tracking in Data Science Scripts [17.53546311589593]
We introduce the ML provenance tracking problem.
We discuss the challenges in capturing such information in the context of Python.
We present Vamsa, a modular system that extracts provenance from Python scripts without requiring any changes to the users' code.
arXiv Detail & Related papers (2020-01-07T02:39:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.