Refactoring Programs Using Large Language Models with Few-Shot Examples
- URL: http://arxiv.org/abs/2311.11690v1
- Date: Mon, 20 Nov 2023 11:43:45 GMT
- Title: Refactoring Programs Using Large Language Models with Few-Shot Examples
- Authors: Atsushi Shirafuji, Yusuke Oda, Jun Suzuki, Makoto Morishita, Yutaka
Watanobe
- Abstract summary: We demonstrate the application of using a large language model (LLM), GPT-3.5, to suggest less complex versions of the user-written Python program.
We show that 95.68% of programs can beed by generating 10 candidates each, resulting in a 17.35% reduction in the average cyclomatic complexity.
- Score: 20.48175387745551
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A less complex and more straightforward program is a crucial factor that
enhances its maintainability and makes writing secure and bug-free programs
easier. However, due to its heavy workload and the risks of breaking the
working programs, programmers are reluctant to do code refactoring, and thus,
it also causes the loss of potential learning experiences. To mitigate this, we
demonstrate the application of using a large language model (LLM), GPT-3.5, to
suggest less complex versions of the user-written Python program, aiming to
encourage users to learn how to write better programs. We propose a method to
leverage the prompting with few-shot examples of the LLM by selecting the
best-suited code refactoring examples for each target programming problem based
on the prior evaluation of prompting with the one-shot example. The
quantitative evaluation shows that 95.68% of programs can be refactored by
generating 10 candidates each, resulting in a 17.35% reduction in the average
cyclomatic complexity and a 25.84% decrease in the average number of lines
after filtering only generated programs that are semantically correct.
Furthermore, the qualitative evaluation shows outstanding capability in code
formatting, while unnecessary behaviors such as deleting or translating
comments are also observed.
Related papers
- An Empirical Study on the Code Refactoring Capability of Large Language Models [0.5852077003870416]
This study empirically evaluates StarCoder2, an LLM optimized for code generation, in code across 30 open-source Java projects.
We compare StarCoder2's performance against human developers, focusing on (1) code quality improvements, (2) types and effectiveness of smells, and (3) enhancements through one-shot and chain-of-thought prompting.
arXiv Detail & Related papers (2024-11-04T17:46:20Z) - NExT: Teaching Large Language Models to Reason about Code Execution [50.93581376646064]
Large language models (LLMs) of code are typically trained on the surface textual form of programs.
We propose NExT, a method to teach LLMs to inspect the execution traces of programs and reason about their run-time behavior.
arXiv Detail & Related papers (2024-04-23T01:46:32Z) - Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments [26.236420215606238]
We develop a framework called PaR that is powered by the Large Language Model.
PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair.
The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance.
arXiv Detail & Related papers (2024-04-02T09:12:21Z) - ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization.
We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains.
For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z) - LPR: Large Language Models-Aided Program Reduction [9.772279651428406]
This paper proposes LPR, the first technique utilizing LLMs to perform language-specific program reduction for multiple languages.
For effectiveness, LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs on benchmarks in C, Rust and JavaScript.
arXiv Detail & Related papers (2023-12-20T14:43:36Z) - Coder Reviewer Reranking for Code Generation [56.80381384717]
We propose Coder-Reviewer reranking as a method for sampling diverse programs from a code language model and reranking with model likelihood.
Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement over reranking with the Coder model only.
Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyper parameters.
arXiv Detail & Related papers (2022-11-29T18:56:33Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.