Mining Software Repositories for Expert Recommendation
- URL: http://arxiv.org/abs/2504.16343v1
- Date: Wed, 23 Apr 2025 01:41:08 GMT
- Title: Mining Software Repositories for Expert Recommendation
- Authors: Chad Marshall, Andrew Barovic, Armin Moin,
- Abstract summary: We propose an automated approach to bug assignment to developers in large open-source software projects.<n>This way, we assist human bug triagers who are in charge of finding the best developer with the right level of expertise in a particular area to be assigned to a newly reported issue.
- Score: 3.481985817302898
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose an automated approach to bug assignment to developers in large open-source software projects. This way, we assist human bug triagers who are in charge of finding the best developer with the right level of expertise in a particular area to be assigned to a newly reported issue. Our approach is based on the history of software development as documented in the issue tracking systems. We deploy BERTopic and techniques from TopicMiner. Our approach works based on the bug reports' features, such as the corresponding products and components, as well as their priority and severity levels. We sort developers based on their experience with specific combinations of new reports. The evaluation is performed using Top-k accuracy, and the results are compared with the reported results in prior work, namely TopicMiner MTM, BUGZIE, Bug triaging via deep Reinforcement Learning BT-RL, and LDA-SVM. The evaluation data come from various Eclipse and Mozilla projects, such as JDT, Firefox, and Thunderbird.
Related papers
- Automated Bug Report Prioritization in Large Open-Source Projects [3.9134031118910264]
We propose a novel approach to automated bug prioritization based on the natural language text of the bug reports.<n>We conduct topic modeling using a variant of LDA called TopicMiner-MTM and text classification with the BERT large language model.<n> Experimental results using an existing reference dataset containing 85,156 bug reports of the Eclipse Platform project indicate that we outperform existing approaches in terms of Accuracy, Precision, Recall, and F1-measure of the bug report priority prediction.
arXiv Detail & Related papers (2025-04-22T13:57:48Z) - Towards a Classification of Open-Source ML Models and Datasets for Software Engineering [52.257764273141184]
Open-source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks.
These resources lack a classification tailored to Software Engineering (SE) needs.
We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
arXiv Detail & Related papers (2024-11-14T18:52:05Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [64.19431011897515]
This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution.<n>Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy.<n>In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9% of in-house issues faced by development engineers, and solved 43.3% of problems after manual intervention.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study [72.24266814625685]
We explore the performance of large language models (LLMs) across the entire software development lifecycle with DevEval.<n>DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task.<n> Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval.
arXiv Detail & Related papers (2024-03-13T15:13:44Z) - MaintainoMATE: A GitHub App for Intelligent Automation of Maintenance
Activities [3.2228025627337864]
Software development projects rely on issue tracking systems at the core of tracking maintenance tasks such as bug reports, and enhancement requests.
The handling of issue-reports is critical and requires thorough scanning of the text entered in an issue-report making it a labor-intensive task.
We present a unified framework called MaintainoMATE, which is capable of automatically categorizing the issue-reports in their respective category and further assigning the issue-reports to a developer with relevant expertise.
arXiv Detail & Related papers (2023-08-31T05:15:42Z) - Recommending Bug Assignment Approaches for Individual Bug Reports: An
Empirical Investigation [8.186068333538893]
Multiple approaches have been proposed to automatically recommend potential developers who can address bug reports.
These approaches are typically designed to work for any bug report submitted to any software project.
We conducted an empirical study to validate this conjecture, using three bug assignment approaches applied on 2,249 bug reports from two open source systems.
arXiv Detail & Related papers (2023-05-29T23:02:56Z) - Multimodal Recommender Systems: A Survey [50.23505070348051]
Multimodal Recommender System (MRS) has attracted much attention from both academia and industry recently.
In this paper, we will give a comprehensive survey of the MRS models, mainly from technical views.
To access more details of the surveyed papers, such as implementation code, we open source a repository.
arXiv Detail & Related papers (2023-02-08T05:12:54Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - S-DABT: Schedule and Dependency-Aware Bug Triage in Open-Source Bug
Tracking Systems [0.0]
Manual bug fixing scheduling can be time-consuming, cumbersome, and error-prone.
We propose the Schedule and Dependency-aware Bug Triage (S-DABT) to assign bugs to suitable developers.
arXiv Detail & Related papers (2022-04-12T17:36:43Z) - Early Detection of Security-Relevant Bug Reports using Machine Learning:
How Far Are We? [6.438136820117887]
In a typical maintenance scenario, security-relevant bug reports are prioritised by the development team when preparing corrective patches.
Open security-relevant bug reports can become a critical leak of sensitive information that attackers can leverage to perform zero-day attacks.
In recent years, approaches for the detection of security-relevant bug reports based on machine learning have been reported with promising performance.
arXiv Detail & Related papers (2021-12-19T11:30:29Z) - S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning.
It is based on a biLSTM encoder and a fully-connected classifier to compute similarity.
Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.