Related papers: Refactoring Programs Using Large Language Models with Few-Shot Examples

Refactoring Programs Using Large Language Models with Few-Shot Examples

URL: http://arxiv.org/abs/2311.11690v1
Date: Mon, 20 Nov 2023 11:43:45 GMT
Title: Refactoring Programs Using Large Language Models with Few-Shot Examples
Authors: Atsushi Shirafuji, Yusuke Oda, Jun Suzuki, Makoto Morishita, Yutaka Watanobe
Abstract summary: We demonstrate the application of using a large language model (LLM), GPT-3.5, to suggest less complex versions of the user-written Python program. We show that 95.68% of programs can beed by generating 10 candidates each, resulting in a 17.35% reduction in the average cyclomatic complexity.
Score: 20.48175387745551
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A less complex and more straightforward program is a crucial factor that enhances its maintainability and makes writing secure and bug-free programs easier. However, due to its heavy workload and the risks of breaking the working programs, programmers are reluctant to do code refactoring, and thus, it also causes the loss of potential learning experiences. To mitigate this, we demonstrate the application of using a large language model (LLM), GPT-3.5, to suggest less complex versions of the user-written Python program, aiming to encourage users to learn how to write better programs. We propose a method to leverage the prompting with few-shot examples of the LLM by selecting the best-suited code refactoring examples for each target programming problem based on the prior evaluation of prompting with the one-shot example. The quantitative evaluation shows that 95.68% of programs can be refactored by generating 10 candidates each, resulting in a 17.35% reduction in the average cyclomatic complexity and a 25.84% decrease in the average number of lines after filtering only generated programs that are semantically correct. Furthermore, the qualitative evaluation shows outstanding capability in code formatting, while unnecessary behaviors such as deleting or translating comments are also observed.

Related papers

Generating refactored code accurately using reinforcement learning [3.179831861897336]
We propose a novel reinforcement learning-based approach for fine-tuning and aligning code language models to perform automated, intelligent extract method on Java source code. Our approach fine-tunes sequence-to-sequence generative models and aligns them using the Proximal Policy Optimization (PPO) algorithm. Our experiments demonstrate that our approach significantly enhances the performance of large language models in code.
arXiv Detail & Related papers (2024-12-23T23:09:48Z)
An Empirical Study on the Code Refactoring Capability of Large Language Models [0.5852077003870416]
This study empirically evaluates StarCoder2, an LLM optimized for code generation, in code across 30 open-source Java projects. We compare StarCoder2's performance against human developers, focusing on (1) code quality improvements, (2) types and effectiveness of smells, and (3) enhancements through one-shot and chain-of-thought prompting.
arXiv Detail & Related papers (2024-11-04T17:46:20Z)
NExT: Teaching Large Language Models to Reason about Code Execution [50.93581376646064]
Large language models (LLMs) of code are typically trained on the surface textual form of programs. We propose NExT, a method to teach LLMs to inspect the execution traces of programs and reason about their run-time behavior.
arXiv Detail & Related papers (2024-04-23T01:46:32Z)
Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments [26.236420215606238]
We develop a framework called PaR that is powered by the Large Language Model. PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair. The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance.
arXiv Detail & Related papers (2024-04-02T09:12:21Z)
ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z)
LPR: Large Language Models-Aided Program Reduction [9.772279651428406]
This paper proposes LPR, the first technique utilizing LLMs to perform language-specific program reduction for multiple languages. For effectiveness, LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs on benchmarks in C, Rust and JavaScript.
arXiv Detail & Related papers (2023-12-20T14:43:36Z)
Coder Reviewer Reranking for Code Generation [56.80381384717]
We propose Coder-Reviewer reranking as a method for sampling diverse programs from a code language model and reranking with model likelihood. Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement over reranking with the Coder model only. Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyper parameters.
arXiv Detail & Related papers (2022-11-29T18:56:33Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs. We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process. Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z)
Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection. We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.