Frustrated with Code Quality Issues? LLMs can Help!
- URL: http://arxiv.org/abs/2309.12938v1
- Date: Fri, 22 Sep 2023 15:37:07 GMT
- Title: Frustrated with Code Quality Issues? LLMs can Help!
- Authors: Nalin Wadhwa, Jui Pradhan, Atharv Sonwane, Surya Prakash Sahu,
Nagarajan Natarajan, Aditya Kanade, Suresh Parthasarathy, Sriram Rajamani
- Abstract summary: static analysis tools are used in developer to flag code quality issues.
Developers need to spend extra efforts to revise their code to improve code quality based on the tool findings.
We present a tool, CORE (short for COde REvisions) to assist developers in revising code to resolve code quality issues.
- Score: 7.67768651817923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As software projects progress, quality of code assumes paramount importance
as it affects reliability, maintainability and security of software. For this
reason, static analysis tools are used in developer workflows to flag code
quality issues. However, developers need to spend extra efforts to revise their
code to improve code quality based on the tool findings. In this work, we
investigate the use of (instruction-following) large language models (LLMs) to
assist developers in revising code to resolve code quality issues. We present a
tool, CORE (short for COde REvisions), architected using a pair of LLMs
organized as a duo comprised of a proposer and a ranker. Providers of static
analysis tools recommend ways to mitigate the tool warnings and developers
follow them to revise their code. The \emph{proposer LLM} of CORE takes the
same set of recommendations and applies them to generate candidate code
revisions. The candidates which pass the static quality checks are retained.
However, the LLM may introduce subtle, unintended functionality changes which
may go un-detected by the static analysis. The \emph{ranker LLM} evaluates the
changes made by the proposer using a rubric that closely follows the acceptance
criteria that a developer would enforce. CORE uses the scores assigned by the
ranker LLM to rank the candidate revisions before presenting them to the
developer. CORE could revise 59.2% Python files (across 52 quality checks) so
that they pass scrutiny by both a tool and a human reviewer. The ranker LLM is
able to reduce false positives by 25.8% in these cases. CORE produced revisions
that passed the static analysis tool in 76.8% Java files (across 10 quality
checks) comparable to 78.3% of a specialized program repair tool, with
significantly much less engineering efforts.
Related papers
- Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - No Man is an Island: Towards Fully Automatic Programming by Code Search, Code Generation and Program Repair [9.562123938545522]
toolname can integrate various code search, generation, and repair tools, combining these three research areas together for the first time.
We conduct preliminary experiments to demonstrate the potential of our framework, eg helping CodeLlama solve 267 programming problems with an improvement of 62.53%.
arXiv Detail & Related papers (2024-09-05T06:24:29Z) - SpecRover: Code Intent Extraction via LLMs [7.742980618437681]
specification inference can be useful for producing high quality program patches.
Our approach SpecRover (AutoCodeRover-v2) is built on the open-source LLM agent AutoCodeRover.
In an evaluation on the full SWE-Bench consisting of 2294 GitHub issues, it shows more than 50% improvement in efficacy over AutoCodeRover.
arXiv Detail & Related papers (2024-08-05T04:53:01Z) - BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce BigCodeBench, a benchmark that challenges Large Language Models (LLMs) to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks.
Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
We propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information.
arXiv Detail & Related papers (2024-06-22T15:52:04Z) - VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development.
We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM)
We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z) - Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail.
The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure.
Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z) - Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs [10.510325069289324]
We propose a self-refinement method aimed at improving the reliability of code generated by LLMs.
Our approach is based on targeted Verification Questions (VQs) to identify potential bugs within the initial code.
Our method attempts to repair these potential bugs by re-prompting the LLM with the targeted VQs and the initial code.
arXiv Detail & Related papers (2024-05-22T19:02:50Z) - AI-powered Code Review with LLMs: Early Results [10.37036924997437]
We present a novel approach to improving software quality and efficiency through a Large Language Model (LLM)-based model.
Our proposed LLM-based AI agent model is trained on large code repositories.
It aims to detect code smells, identify potential bugs, provide suggestions for improvement, and optimize the code.
arXiv Detail & Related papers (2024-04-29T08:27:50Z) - CYCLE: Learning to Self-Refine the Code Generation [19.71833229434497]
We propose CYCLE framework, learning to self-refine the faulty generation according to the available feedback.
We implement four variants of CYCLE with varied numbers of parameters across 350M, 1B, 2B, and 3B benchmarks.
The results reveal that CYCLE successfully maintains, sometimes improves, the quality of one-time code generation, while significantly improving the self-refinement capability of code LMs.
arXiv Detail & Related papers (2024-03-27T16:45:02Z) - Assured LLM-Based Software Engineering [51.003878077888686]
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
arXiv Detail & Related papers (2024-02-06T20:38:46Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.