GitRank: A Framework to Rank GitHub Repositories
- URL: http://arxiv.org/abs/2205.02360v1
- Date: Wed, 4 May 2022 23:42:30 GMT
- Title: GitRank: A Framework to Rank GitHub Repositories
- Authors: Niranjan Hasabnis
- Abstract summary: Open-source repositories provide wealth of information and are increasingly being used to build artificial intelligence (AI) based systems.
In this hackathon, we utilize known code quality measures and GrimoireLab toolkit to implement a framework, named GitRank, to rank open-source repositories on three different criteria.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-source repositories provide wealth of information and are increasingly
being used to build artificial intelligence (AI) based systems to solve
problems in software engineering. Open-source repositories could be of varying
quality levels, and bad-quality repositories could degrade performance of these
systems. Evaluating quality of open-source repositories, which is not available
directly on code hosting sites such as GitHub, is thus important. In this
hackathon, we utilize known code quality measures and GrimoireLab toolkit to
implement a framework, named GitRank, to rank open-source repositories on three
different criteria. We discuss our findings and preliminary evaluation in this
hackathon report.
Related papers
- RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions.
RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits.
To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible.
In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z) - Unveiling A Hidden Risk: Exposing Educational but Malicious Repositories
in GitHub [0.0]
We use ChatGPT to understand and annotate the content published in software repositories.
We carry out a systematic study on a collection of 35.2K GitHub repositories claimed to be created for educational purposes only.
arXiv Detail & Related papers (2024-03-07T11:36:09Z) - RepoAgent: An LLM-Powered Open-Source Framework for Repository-level
Code Documentation Generation [79.83270415843857]
We introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code documentation.
We have validated the effectiveness of our approach, showing that RepoAgent excels in generating high-quality repository-level documentation.
arXiv Detail & Related papers (2024-02-26T15:39:52Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z) - Are Machine Programming Systems using Right Source-Code Measures to
Select Code Repositories? [0.0]
Machine programming (MP) is an emerging field at the intersection of deterministic and probabilistic computing.
MP systems often rely on vast amount of open-source code to learn interesting properties about code and programming.
MP systems either do not consider quality of code repositories or use atypical quality measures.
arXiv Detail & Related papers (2022-09-24T07:34:18Z) - Automatically Categorising GitHub Repositories by Application Domain [14.265666415804025]
GitHub is the largest host of open source software on the Internet.
It is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains.
Past work has shown that taking the application domain into account is crucial for tasks such as predicting the popularity of a repository.
arXiv Detail & Related papers (2022-07-30T16:27:16Z) - LabelGit: A Dataset for Software Repositories Classification using
Attributed Dependency Graphs [11.523471275501857]
We create a new dataset of GitHub projects called LabelGit.
Our dataset uses direct information from the source code, like the dependency graph and source code neural representations from the identifiers.
We hope to aid the development of solutions that do not rely on proxies but use the entire source code to perform classification.
arXiv Detail & Related papers (2021-03-16T07:28:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.