Related papers: Software issues report for bug fixing process: An empirical study of machine-learning libraries

Software issues report for bug fixing process: An empirical study of machine-learning libraries

URL: http://arxiv.org/abs/2312.06005v1
Date: Sun, 10 Dec 2023 21:33:19 GMT
Title: Software issues report for bug fixing process: An empirical study of machine-learning libraries
Authors: Adekunle Ajibode, Dong Yunwei, Yang Hongji
Abstract summary: We investigated the effectiveness of issue resolution for bug-fixing processes in six machine-learning libraries. The most common categories of issues that arise in machine-learning libraries are bugs, documentation, optimization, crashes, enhancement, new feature requests, build/CI, support, and performance. This study concludes that efficient issue-tracking processes, effective communication, and collaboration are vital for effective resolution of issues and bug fixing processes in machine-learning libraries.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Issue resolution and bug-fixing processes are essential in the development of machine-learning libraries, similar to software development, to ensure well-optimized functions. Understanding the issue resolution and bug-fixing process of machine-learning libraries can help developers identify areas for improvement and optimize their strategies for issue resolution and bug-fixing. However, detailed studies on this topic are lacking. Therefore, we investigated the effectiveness of issue resolution for bug-fixing processes in six machine-learning libraries: Tensorflow, Keras, Theano, Pytorch, Caffe, and Scikit-learn. We addressed seven research questions (RQs) using 16,921 issues extracted from the GitHub repository via the GitHub Rest API. We employed several quantitative methods of data analysis, including correlation, OLS regression, percentage and frequency count, and heatmap to analyze the RQs. We found the following through our empirical investigation: (1) The most common categories of issues that arise in machine-learning libraries are bugs, documentation, optimization, crashes, enhancement, new feature requests, build/CI, support, and performance. (2) Effective strategies for addressing these problems include fixing critical bugs, optimizing performance, and improving documentation. (3) These categorized issues are related to testing and runtime and are common among all six machine-learning libraries. (4) Monitoring the total number of comments on issues can provide insights into the duration of the issues. (5) It is crucial to strike a balance between prioritizing critical issues and addressing other issues in a timely manner. Therefore, this study concludes that efficient issue-tracking processes, effective communication, and collaboration are vital for effective resolution of issues and bug fixing processes in machine-learning libraries.

Related papers

RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library [58.404895570822184]
RV-Syn is a novel mathematical Synthesis approach. It generates graphs as solutions by combining Python-formatted functions from this library. Based on the constructed graph, we achieve solution-guided logic-aware problem generation.
arXiv Detail & Related papers (2025-04-29T04:42:02Z)
Bogus Bugs, Duplicates, and Revealing Comments: Data Quality Issues in NPR [4.852619858744873]
We report some of the data-related issues we have come across when working with several large APR datasets and benchmarks. We believe that more data-focused approaches could improve the performance and robustness of current and future APR systems.
arXiv Detail & Related papers (2025-03-11T15:23:13Z)
A Systematic Survey on Debugging Techniques for Machine Learning Systems [5.747738795689893]
Machine learning (ML) software poses unique challenges compared to traditional software. Various methods have been proposed for testing, diagnosing, and repairing ML systems. However, the big picture informing important research directions that fulfill developers needs is yet to unfold.
arXiv Detail & Related papers (2025-03-05T03:57:20Z)
An Empirical Study on the Classification of Bug Reports with Machine Learning [1.1499574149885023]
We study how different factors (e.g., project language, report content) can influence the performance of models in handling classification of issue reports. Using the report title or description does not significantly differ; Support Vector Machine, Logistic Regression, and Random Forest are effective in classifying issue reports. Models based on heterogeneous projects can classify reports from projects not present during training.
arXiv Detail & Related papers (2025-03-01T23:19:56Z)
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities. Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z)
Benchmarking Predictive Coding Networks -- Made Simple [48.652114040426625]
We tackle the problems of efficiency and scalability for predictive coding networks (PCNs) in machine learning. We propose a library, called PCX, that focuses on performance and simplicity, and use it to implement a large set of standard benchmarks. We perform extensive tests on such benchmarks using both existing algorithms for PCNs, as well as adaptations of other methods popular in the bio-plausible deep learning community.
arXiv Detail & Related papers (2024-07-01T10:33:44Z)
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models [95.96734086126469]
Large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications. For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work. We perform a detailed coarse-to-fine analysis of the inference performance of various code libraries.
arXiv Detail & Related papers (2024-04-17T15:57:50Z)
An Empirical Study of Challenges in Machine Learning Asset Management [15.07444988262748]
Despite existing research, a significant knowledge gap remains in operational challenges like model versioning, data traceability, and collaboration. Our study aims to address this gap by analyzing 15,065 posts from developer forums and platforms. We uncover 133 topics related to asset management challenges, grouped into 16 macro-topics, with software dependency, model deployment, and model training being the most discussed.
arXiv Detail & Related papers (2024-02-25T05:05:52Z)
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning [61.21923643289266]
Chain of Manipulations is a mechanism that enables Vision-Language Models to solve problems step-by-step with evidence. After training, models can solve various visual problems by eliciting intrinsic manipulations (e.g., grounding, zoom in) actively without involving external tools. Our trained model, textbfCogCoM, achieves state-of-the-art performance across 9 benchmarks from 4 categories.
arXiv Detail & Related papers (2024-02-06T18:43:48Z)
Leveraging Print Debugging to Improve Code Generation in Large Language Models [63.63160583432348]
Large language models (LLMs) have made significant progress in code generation tasks. But their performance in tackling programming problems with complex data structures and algorithms remains suboptimal. We propose an in-context learning approach that guides LLMs to debug by using a "print debug" method.
arXiv Detail & Related papers (2024-01-10T18:37:59Z)
An Empirical Study on Bugs Inside PyTorch: A Replication Study [10.848682558737494]
We characterize bugs in the PyTorch library, a very popular deep learning framework. Our results highlight that PyTorch bugs are more like traditional software projects bugs, than related to deep learning characteristics.
arXiv Detail & Related papers (2023-07-25T19:23:55Z)
Automatic Static Bug Detection for Machine Learning Libraries: Are We There Yet? [14.917820383894124]
We analyze five popular and widely used static bug detectors, i.e., Flawfinder, RATS, Cppcheck, Facebook Infer, and Clang, on a curated dataset of software bugs. Overall, our study shows that static bug detectors find a negligible amount of all bugs accounting for 6/410 bugs (0.01%), Flawfinder and RATS are the most effective static checker for finding software bugs in machine learning libraries.
arXiv Detail & Related papers (2023-07-09T01:38:52Z)
LibAUC: A Deep Learning Library for X-Risk Optimization [43.32145407575245]
This paper introduces the award-winning deep learning (DL) library called LibAUC. LibAUC implements state-of-the-art algorithms towards optimizing a family of risk functions named X-risks.
arXiv Detail & Related papers (2023-06-05T17:43:46Z)
SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks. It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches. We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z)
What Makes Good Contrastive Learning on Small-Scale Wearable-based Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task. This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z)
What to Prioritize? Natural Language Processing for the Development of a Modern Bug Tracking Solution in Hardware Development [0.0]
We present an approach to predict the time to fix, the risk and the complexity of a bug report using different supervised machine learning algorithms. The evaluation shows that a combination of text embeddings generated through the Universal Sentence model outperforms all other methods.
arXiv Detail & Related papers (2021-09-28T15:55:10Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.