Are Machine Programming Systems using Right Source-Code Measures to
Select Code Repositories?
- URL: http://arxiv.org/abs/2209.11946v1
- Date: Sat, 24 Sep 2022 07:34:18 GMT
- Title: Are Machine Programming Systems using Right Source-Code Measures to
Select Code Repositories?
- Authors: Niranjan Hasabnis
- Abstract summary: Machine programming (MP) is an emerging field at the intersection of deterministic and probabilistic computing.
MP systems often rely on vast amount of open-source code to learn interesting properties about code and programming.
MP systems either do not consider quality of code repositories or use atypical quality measures.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine programming (MP) is an emerging field at the intersection of
deterministic and probabilistic computing, and it aims to assist software and
hardware engineers, among other applications. Along with powerful compute
resources, MP systems often rely on vast amount of open-source code to learn
interesting properties about code and programming and solve problems in the
areas of debugging, code recommendation, auto-completion, etc. Unfortunately,
several of the existing MP systems either do not consider quality of code
repositories or use atypical quality measures than those typically used in
software engineering community to select them. As such, impact of quality of
code repositories on the performance of these systems needs to be studied.
In this preliminary paper, we evaluate impact of different quality
repositories on the performance of a candidate MP system. Towards that
objective, we develop a framework, named GitRank, to rank open-source
repositories on quality, maintainability, and popularity by leveraging existing
research on this topic. We then apply GitRank to evaluate correlation between
the quality measures used by the candidate MP system and the quality measures
used by our framework. Our preliminary results reveal some correlation between
the quality measures used in GitRank and ControlFlag's performance, suggesting
that some of the measures used in GitRank are applicable to ControlFlag. But it
also raises questions around right quality measures for code repositories used
in MP systems. We believe that our findings also generate interesting insights
towards code quality measures that affect performance of MP systems.
Related papers
- On Iterative Evaluation and Enhancement of Code Quality Using GPT-4o [1.5960340244043023]
This paper introduces CodeQUEST, a novel framework leveraging Large Language Models (LLMs) to iteratively evaluate and enhance code quality.
The framework is divided into two main components: an Evaluator that assesses code quality across ten dimensions, providing both quantitative scores and qualitative summaries.
Our study demonstrates that CodeQUEST can effectively and robustly evaluate code quality, with its assessments aligning with established code quality metrics.
arXiv Detail & Related papers (2025-02-11T09:27:00Z) - CoReQA: Uncovering Potentials of Language Models in Code Repository Question Answering [12.431784613373523]
We introduce CoReQA, a benchmark for Code Repository-level question answering.
CoReQA was constructed from GitHub issues and comments from 176 popular repositories across four programming languages.
We show that state-of-the-art proprietary and long-context models struggle to address repository-level questions effectively.
arXiv Detail & Related papers (2025-01-07T00:24:07Z) - DOCE: Finding the Sweet Spot for Execution-Based Code Generation [69.5305729627198]
We propose a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decoding, and self-ging as the core components.
Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods.
arXiv Detail & Related papers (2024-08-25T07:10:36Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - RepoAgent: An LLM-Powered Open-Source Framework for Repository-level
Code Documentation Generation [79.83270415843857]
We introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code documentation.
We have validated the effectiveness of our approach, showing that RepoAgent excels in generating high-quality repository-level documentation.
arXiv Detail & Related papers (2024-02-26T15:39:52Z) - CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology [4.2990995991059275]
Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs) have transformed the field of Software Engineering.
We introduce CodePori, a novel system designed to automate code generation for large and complex software projects.
Results: CodePori is able to generate running code for large-scale projects, aligned with the typical software development process.
arXiv Detail & Related papers (2024-02-02T13:42:50Z) - Finding Software Vulnerabilities in Open-Source C Projects via Bounded
Model Checking [2.9129603096077332]
We advocate that bounded model-checking techniques can efficiently detect vulnerabilities in general software systems.
We have developed and evaluated a methodology to verify large software systems using a state-of-the-art bounded model checker.
arXiv Detail & Related papers (2023-11-09T11:25:24Z) - Lessons from Formally Verified Deployed Software Systems (Extended version) [65.69802414600832]
This article examines a range of projects, in various application areas, that have produced formally verified systems and deployed them for actual use.
It considers the technologies used, the form of verification applied, the results obtained, and the lessons that the software industry should draw regarding its ability to benefit from formal verification techniques and tools.
arXiv Detail & Related papers (2023-01-05T18:18:46Z) - GitRank: A Framework to Rank GitHub Repositories [0.0]
Open-source repositories provide wealth of information and are increasingly being used to build artificial intelligence (AI) based systems.
In this hackathon, we utilize known code quality measures and GrimoireLab toolkit to implement a framework, named GitRank, to rank open-source repositories on three different criteria.
arXiv Detail & Related papers (2022-05-04T23:42:30Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.