Classifying Issues in Open-source GitHub Repositories
- URL: http://arxiv.org/abs/2507.18982v1
- Date: Fri, 25 Jul 2025 06:20:54 GMT
- Title: Classifying Issues in Open-source GitHub Repositories
- Authors: Amir Hossain Raaj, Fairuz Nawer Meem, Sadia Afrin Mim,
- Abstract summary: GitHub is the most widely used platform for software maintenance in the open-source community.<n>Developers report issues on GitHub from time to time while facing difficulties.<n>Most of the GitHub repositories do not maintain regular labeling for the issues.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: GitHub is the most widely used platform for software maintenance in the open-source community. Developers report issues on GitHub from time to time while facing difficulties. Having labels on those issues can help developers easily address those issues with prior knowledge of labels. However, most of the GitHub repositories do not maintain regular labeling for the issues. The goal of this work is to classify issues in the open-source community using ML \& DNN models. There are thousands of open-source repositories on GitHub. Some of the repositories label their issues properly whereas some of them do not. When issues are pre-labeled, the problem-solving process and the immediate assignment of corresponding personnel are facilitated for the team, thereby expediting the development process. In this work, we conducted an analysis of prominent GitHub open-source repositories. We classified the issues in some common labels which are: API, Documentation, Enhancement, Question, Easy, Help-wanted, Dependency, CI, Waiting for OP's response, Test, Bug, etc. Our study shows that DNN models outperf
Related papers
- SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z) - Analyzing the Usage of Donation Platforms for PyPI Libraries [91.97201077607862]
This study analyzes the adoption of donation platforms in the PyPI ecosystem.<n> GitHub Sponsors is the dominant platform, though many PyPI-listed links are outdated.
arXiv Detail & Related papers (2025-03-11T10:27:31Z) - Visual Analysis of GitHub Issues to Gain Insights [2.9051263101214566]
This paper presents a prototype web application that generates visualizations to offer insights into issue timelines.
It focuses on the lifecycle of issues and depicts vital information to enhance users' understanding of development patterns.
arXiv Detail & Related papers (2024-07-30T15:17:57Z) - Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [64.19431011897515]
This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution.<n>Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy.<n>In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9% of in-house issues faced by development engineers, and solved 43.3% of problems after manual intervention.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits.<n>To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible.<n>In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z) - MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution [47.850418420195304]
Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving GitHub issues.
We propose a novel Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four agents customized for software evolution.
arXiv Detail & Related papers (2024-03-26T17:57:57Z) - Characterizing Issue Management in Runtime Systems [0.38233569758620056]
We report an empirical study of around 118K issues from 34 runtime system repos in GitHub.
We found that issues regarding enhancement, test failure and bug are mostly posted on runtime system repositories.
82.65% issues are tagged with labels while only 28.30% issues have designated assignees.
arXiv Detail & Related papers (2023-10-24T16:12:52Z) - SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [80.52201658231895]
SWE-bench is an evaluation framework consisting of $2,294$ software engineering problems drawn from real GitHub issues and corresponding pull requests across $12$ popular Python repositories.
We show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues.
arXiv Detail & Related papers (2023-10-10T16:47:29Z) - Wait, wasn't that code here before? Detecting Outdated Software
Documentation [9.45052138795667]
We present a GitHub Actions tool that automatically scans for outdated code element references.
More than a quarter of the 1000 most popular projects on GitHub contained at least one outdated reference.
arXiv Detail & Related papers (2023-07-10T00:52:29Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - Predicting Issue Types on GitHub [8.791809365994682]
Ticket Tagger is a GitHub app analyzing the issue title and description through machine learning techniques.
We empirically evaluated the tool's prediction performance on about 30,000 GitHub issues.
arXiv Detail & Related papers (2021-07-21T08:14:48Z) - LabelGit: A Dataset for Software Repositories Classification using
Attributed Dependency Graphs [11.523471275501857]
We create a new dataset of GitHub projects called LabelGit.
Our dataset uses direct information from the source code, like the dependency graph and source code neural representations from the identifiers.
We hope to aid the development of solutions that do not rely on proxies but use the entire source code to perform classification.
arXiv Detail & Related papers (2021-03-16T07:28:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.