Testing Refactoring Engine via Historical Bug Report driven LLM
- URL: http://arxiv.org/abs/2501.09879v2
- Date: Wed, 22 Jan 2025 01:38:36 GMT
- Title: Testing Refactoring Engine via Historical Bug Report driven LLM
- Authors: Haibo Wang, Zhuolin Xu, Shin Hwei Tan,
- Abstract summary: Refactoring is the process of restructuring existing code without changing its external behavior.
We propose RETESTER, a framework for automated engine testing.
- Score: 6.852749659993347
- License:
- Abstract: Refactoring is the process of restructuring existing code without changing its external behavior while improving its internal structure. Refactoring engines are integral components of modern Integrated Development Environments (IDEs) and can automate or semi-automate this process to enhance code readability, reduce complexity, and improve the maintainability of software products. Similar to traditional software systems such as compilers, refactoring engines may also contain bugs that can lead to unexpected behaviors. In this paper, we propose a novel approach called RETESTER, a LLM-based framework for automated refactoring engine testing. Specifically, by using input program structure templates extracted from historical bug reports and input program characteristics that are error-prone, we design chain-of-thought (CoT) prompts to perform refactoring-preserving transformations. The generated variants are then tested on the latest version of refactoring engines using differential testing. We evaluate RETESTER on two most popular modern refactoring engines (i.e., ECLIPSE, and INTELLIJ IDEA). It successfully revealed 18 new bugs in the latest version of those refactoring engines. By the time we submit our paper, seven of them were confirmed by their developers, and three were fixed.
Related papers
- Specifications: The missing link to making the development of LLM systems an engineering discipline [65.10077876035417]
We discuss the progress the field has made so far-through advances like structured outputs, process supervision, and test-time compute.
We outline several future directions for research to enable the development of modular and reliable LLM-based systems.
arXiv Detail & Related papers (2024-11-25T07:48:31Z) - An Empirical Study on the Potential of LLMs in Automated Software Refactoring [9.157968996300417]
We investigate the potential of large language models (LLMs) in automated software.
We find that 13 out of the 176 solutions suggested by ChatGPT and 9 out of the 137 solutions suggested by Gemini were unsafe in that they either changed the functionality of the source code or introduced syntax errors.
arXiv Detail & Related papers (2024-11-07T05:35:55Z) - Automated Unit Test Refactoring [10.847400457238423]
Test smells arise from poor design practices and insufficient domain knowledge.
We propose UTRefactor, a context-enhanced, LLM-based framework for automatic test in Java projects.
We evaluate UTRefactor on 879 tests from six open-source Java projects, reducing the number of test smells from 2,375 to 265, achieving an 89% reduction.
arXiv Detail & Related papers (2024-09-25T08:42:29Z) - An Empirical Study of Refactoring Engine Bugs [7.412890903261693]
We present the first systematic study of engine bugs by analyzing bugs in Eclipse, IntelliJ IDEA, and Netbeans.
We analyzed these bugs according to their types, symptoms, root causes, and triggering conditions.
Our transferability study revealed 130 new bugs in the latest version of those engines.
arXiv Detail & Related papers (2024-09-22T22:09:39Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization.
We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains.
For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename
Refactoring [57.8069006460087]
We study automatic rename on variable names, which is considered more challenging than other rename activities.
We propose RefBERT, a two-stage pre-trained framework for rename on variable names.
We show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.
arXiv Detail & Related papers (2023-05-28T12:29:39Z) - Do code refactorings influence the merge effort? [80.1936417993664]
Multiple contributors frequently change the source code in parallel to implement new features, fix bugs, existing code, and make other changes.
These simultaneous changes need to be merged into the same version of the source code.
Studies show that 10 to 20 percent of all merge attempts result in conflicts, which require the manual developer's intervention to complete the process.
arXiv Detail & Related papers (2023-05-10T13:24:59Z) - How We Refactor and How We Document it? On the Use of Supervised Machine
Learning Algorithms to Classify Refactoring Documentation [25.626914797750487]
Refactoring is the art of improving the design of a system without altering its external behavior.
This study categorizes commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories.
To better understand our classification results, we analyzed commit messages to extract patterns that developers regularly use to describe their smells.
arXiv Detail & Related papers (2020-10-26T20:33:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.