Related papers: GitRank: A Framework to Rank GitHub Repositories

GitRank: A Framework to Rank GitHub Repositories

URL: http://arxiv.org/abs/2205.02360v1
Date: Wed, 4 May 2022 23:42:30 GMT
Title: GitRank: A Framework to Rank GitHub Repositories
Authors: Niranjan Hasabnis
Abstract summary: Open-source repositories provide wealth of information and are increasingly being used to build artificial intelligence (AI) based systems. In this hackathon, we utilize known code quality measures and GrimoireLab toolkit to implement a framework, named GitRank, to rank open-source repositories on three different criteria.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open-source repositories provide wealth of information and are increasingly being used to build artificial intelligence (AI) based systems to solve problems in software engineering. Open-source repositories could be of varying quality levels, and bad-quality repositories could degrade performance of these systems. Evaluating quality of open-source repositories, which is not available directly on code hosting sites such as GitHub, is thus important. In this hackathon, we utilize known code quality measures and GrimoireLab toolkit to implement a framework, named GitRank, to rank open-source repositories on three different criteria. We discuss our findings and preliminary evaluation in this hackathon report.

Related papers

RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions. RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z)
CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation. We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks. We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z)
How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE) We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits. To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible. In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z)
Unveiling A Hidden Risk: Exposing Educational but Malicious Repositories in GitHub [0.0]
We use ChatGPT to understand and annotate the content published in software repositories. We carry out a systematic study on a collection of 35.2K GitHub repositories claimed to be created for educational purposes only.
arXiv Detail & Related papers (2024-03-07T11:36:09Z)
RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation [79.83270415843857]
We introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code documentation. We have validated the effectiveness of our approach, showing that RepoAgent excels in generating high-quality repository-level documentation.
arXiv Detail & Related papers (2024-02-26T15:39:52Z)
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process. It incorporates a similarity-based retriever and a pre-trained code language model. It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z)
Are Machine Programming Systems using Right Source-Code Measures to Select Code Repositories? [0.0]
Machine programming (MP) is an emerging field at the intersection of deterministic and probabilistic computing. MP systems often rely on vast amount of open-source code to learn interesting properties about code and programming. MP systems either do not consider quality of code repositories or use atypical quality measures.
arXiv Detail & Related papers (2022-09-24T07:34:18Z)
Automatically Categorising GitHub Repositories by Application Domain [14.265666415804025]
GitHub is the largest host of open source software on the Internet. It is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains. Past work has shown that taking the application domain into account is crucial for tasks such as predicting the popularity of a repository.
arXiv Detail & Related papers (2022-07-30T16:27:16Z)
The penumbra of open source: projects outside of centralized platforms are longer maintained, more academic and more collaborative [0.0]
We develop a novel, extensive sample of public open source project repositories outside of centralized platforms. Our sample projects tend to have more collaborators, are maintained for longer periods, and tend to be more focused on academic and scientific problems.
arXiv Detail & Related papers (2021-06-29T17:54:26Z)
LabelGit: A Dataset for Software Repositories Classification using Attributed Dependency Graphs [11.523471275501857]
We create a new dataset of GitHub projects called LabelGit. Our dataset uses direct information from the source code, like the dependency graph and source code neural representations from the identifiers. We hope to aid the development of solutions that do not rely on proxies but use the entire source code to perform classification.
arXiv Detail & Related papers (2021-03-16T07:28:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.