Automated Bug Report Prioritization in Large Open-Source Projects
- URL: http://arxiv.org/abs/2504.15912v1
- Date: Tue, 22 Apr 2025 13:57:48 GMT
- Title: Automated Bug Report Prioritization in Large Open-Source Projects
- Authors: Riley Pierson, Armin Moin,
- Abstract summary: We propose a novel approach to automated bug prioritization based on the natural language text of the bug reports.<n>We conduct topic modeling using a variant of LDA called TopicMiner-MTM and text classification with the BERT large language model.<n> Experimental results using an existing reference dataset containing 85,156 bug reports of the Eclipse Platform project indicate that we outperform existing approaches in terms of Accuracy, Precision, Recall, and F1-measure of the bug report priority prediction.
- Score: 3.9134031118910264
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large open-source projects receive a large number of issues (known as bugs), including software defect (i.e., bug) reports and new feature requests from their user and developer communities at a fast rate. The often limited project resources do not allow them to deal with all issues. Instead, they have to prioritize them according to the project's priorities and the issues' severities. In this paper, we propose a novel approach to automated bug prioritization based on the natural language text of the bug reports that are stored in the open bug repositories of the issue-tracking systems. We conduct topic modeling using a variant of LDA called TopicMiner-MTM and text classification with the BERT large language model to achieve a higher performance level compared to the state-of-the-art. Experimental results using an existing reference dataset containing 85,156 bug reports of the Eclipse Platform project indicate that we outperform existing approaches in terms of Accuracy, Precision, Recall, and F1-measure of the bug report priority prediction.
Related papers
- BugsRepo: A Comprehensive Curated Dataset of Bug Reports, Comments and Contributors Information from Bugzilla [0.0]
fontfamilypplselectfont BugsRepo is a multifaceted dataset derived from Mozilla projects.
It includes a Bug report meta-data & Comments dataset with detailed records for 119,585 fixed or closed and resolved bug reports.
Second, fontfamilypplselectfont BugsRepo features a contributor information dataset comprising 19,351 Mozilla community members.
Third, the dataset provides a structured bug report subset of 10,351 well-structured bug reports.
arXiv Detail & Related papers (2025-04-26T05:24:21Z) - Automated Duplicate Bug Report Detection in Large Open Bug Repositories [3.481985817302898]
Many users and contributors of large open-source projects report software defects or enhancement requests (known as bug reports) to the issue-tracking systems.<n>We propose a novel approach based on machine learning methods that can automatically detect duplicate bug reports in an open bug repository.
arXiv Detail & Related papers (2025-04-21T01:55:54Z) - An Empirical Study on the Classification of Bug Reports with Machine Learning [1.1499574149885023]
We study how different factors (e.g., project language, report content) can influence the performance of models in handling classification of issue reports.<n>Using the report title or description does not significantly differ; Support Vector Machine, Logistic Regression, and Random Forest are effective in classifying issue reports.<n>Models based on heterogeneous projects can classify reports from projects not present during training.
arXiv Detail & Related papers (2025-03-01T23:19:56Z) - Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail.
The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure.
Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z) - Toward Rapid Bug Resolution for Android Apps [0.4759142872591625]
This paper describes the existing limitations of bug reports and identifies potential strategies for addressing them.
Our vision encompasses a future where the alleviation of these limitations and successful execution of our proposed new research directions can benefit both reporters and developers.
arXiv Detail & Related papers (2023-12-23T18:29:06Z) - PrAIoritize: Automated Early Prediction and Prioritization of Vulnerabilities in Smart Contracts [1.081463830315253]
Smart contracts are prone to numerous security threats due to undisclosed vulnerabilities and code weaknesses.
Efficient prioritization is crucial for smart contract security.
Our research aims to provide an automated approach, PrAIoritize, for prioritizing and predicting critical code weaknesses.
arXiv Detail & Related papers (2023-08-21T23:30:39Z) - Auto-labelling of Bug Report using Natural Language Processing [0.0]
Rule and Query-based solutions recommend a long list of potential similar bug reports with no clear ranking.
In this paper, we have proposed a solution using a combination of NLP techniques.
It uses a custom data transformer, a deep neural network, and a non-generalizing machine learning method to retrieve existing identical bug reports.
arXiv Detail & Related papers (2022-12-13T02:32:42Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - Shortcomings of Question Answering Based Factuality Frameworks for Error
Localization [51.01957350348377]
We show that question answering (QA)-based factuality metrics fail to correctly identify error spans in generated summaries.
Our analysis reveals a major reason for such poor localization: questions generated by the QG module often inherit errors from non-factual summaries which are then propagated further into downstream modules.
Our experiments conclusively show that there exist fundamental issues with localization using the QA framework which cannot be fixed solely by stronger QA and QG models.
arXiv Detail & Related papers (2022-10-13T05:23:38Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning.
It is based on a biLSTM encoder and a fully-connected classifier to compute similarity.
Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.