Related papers: Testing Refactoring Engine via Historical Bug Report driven LLM

Testing Refactoring Engine via Historical Bug Report driven LLM

URL: http://arxiv.org/abs/2501.09879v2
Date: Wed, 22 Jan 2025 01:38:36 GMT
Title: Testing Refactoring Engine via Historical Bug Report driven LLM
Authors: Haibo Wang, Zhuolin Xu, Shin Hwei Tan,
Abstract summary: Refactoring is the process of restructuring existing code without changing its external behavior.<n>We propose RETESTER, a framework for automated engine testing.
Score: 6.852749659993347
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Refactoring is the process of restructuring existing code without changing its external behavior while improving its internal structure. Refactoring engines are integral components of modern Integrated Development Environments (IDEs) and can automate or semi-automate this process to enhance code readability, reduce complexity, and improve the maintainability of software products. Similar to traditional software systems such as compilers, refactoring engines may also contain bugs that can lead to unexpected behaviors. In this paper, we propose a novel approach called RETESTER, a LLM-based framework for automated refactoring engine testing. Specifically, by using input program structure templates extracted from historical bug reports and input program characteristics that are error-prone, we design chain-of-thought (CoT) prompts to perform refactoring-preserving transformations. The generated variants are then tested on the latest version of refactoring engines using differential testing. We evaluate RETESTER on two most popular modern refactoring engines (i.e., ECLIPSE, and INTELLIJ IDEA). It successfully revealed 18 new bugs in the latest version of those refactoring engines. By the time we submit our paper, seven of them were confirmed by their developers, and three were fixed.

Related papers

Build Code Needs Maintenance Too: A Study on Refactoring and Technical Debt in Build Systems [2.189169499230464]
In modern software engineering, build systems play the crucial role of facilitating the conversion of source code into software artifacts. Recent research has explored high-level causes of build failures, but has largely overlooked the structural properties of build files.
arXiv Detail & Related papers (2025-04-02T17:07:38Z)
MANTRA: Enhancing Automated Method-Level Refactoring with Contextual RAG and Multi-Agent LLM Collaboration [44.75848695076576]
We introduce MANTRA, a comprehensive Large Language Models agent-based framework. ManTRA integrates Context-Aware Retrieval-Augmented Generation, coordinated Multi-Agent Collaboration, and Verbal Reinforcement Learning. Experimental results demonstrate that MANTRA substantially surpasses a baseline LLM model.
arXiv Detail & Related papers (2025-03-18T15:16:51Z)
Refactoring Detection in C++ Programs with RefactoringMiner++ [45.045206894182776]
We present RefactoringMiner++, a detection tool based on the current state of the art: RefactoringMiner 3. While the latter focuses exclusively on Java, our tool is seeded -- to the best of our knowledge -- the first publicly available detection tool for C++ projects.
arXiv Detail & Related papers (2025-02-24T23:17:35Z)
An Empirical Study on the Potential of LLMs in Automated Software Refactoring [9.157968996300417]
We investigate the potential of large language models (LLMs) in automated software. We find that 13 out of the 176 solutions suggested by ChatGPT and 9 out of the 137 solutions suggested by Gemini were unsafe in that they either changed the functionality of the source code or introduced syntax errors.
arXiv Detail & Related papers (2024-11-07T05:35:55Z)
Context-Enhanced LLM-Based Framework for Automatic Test Refactoring [10.847400457238423]
Test smells arise from poor design practices and insufficient domain knowledge. We propose UTRefactor, a context-enhanced, LLM-based framework for automatic test in Java projects. We evaluate UTRefactor on 879 tests from six open-source Java projects, reducing the number of test smells from 2,375 to 265, achieving an 89% reduction.
arXiv Detail & Related papers (2024-09-25T08:42:29Z)
An Empirical Study of Refactoring Engine Bugs [7.412890903261693]
We present the first systematic study of engine bugs by analyzing bugs in Eclipse, IntelliJ IDEA, and Netbeans. We analyzed these bugs according to their types, symptoms, root causes, and triggering conditions. Our transferability study revealed 130 new bugs in the latest version of those engines.
arXiv Detail & Related papers (2024-09-22T22:09:39Z)
CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation. We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks. We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z)
ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring [57.8069006460087]
We study automatic rename on variable names, which is considered more challenging than other rename activities. We propose RefBERT, a two-stage pre-trained framework for rename on variable names. We show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.
arXiv Detail & Related papers (2023-05-28T12:29:39Z)
Do code refactorings influence the merge effort? [80.1936417993664]
Multiple contributors frequently change the source code in parallel to implement new features, fix bugs, existing code, and make other changes. These simultaneous changes need to be merged into the same version of the source code. Studies show that 10 to 20 percent of all merge attempts result in conflicts, which require the manual developer's intervention to complete the process.
arXiv Detail & Related papers (2023-05-10T13:24:59Z)
How We Refactor and How We Document it? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation [25.626914797750487]
Refactoring is the art of improving the design of a system without altering its external behavior. This study categorizes commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories. To better understand our classification results, we analyzed commit messages to extract patterns that developers regularly use to describe their smells.
arXiv Detail & Related papers (2020-10-26T20:33:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.