Related papers: RefModel: Detecting Refactorings using Foundation Models

RefModel: Detecting Refactorings using Foundation Models

URL: http://arxiv.org/abs/2507.11346v1
Date: Tue, 15 Jul 2025 14:20:56 GMT
Title: RefModel: Detecting Refactorings using Foundation Models
Authors: Pedro Simões, Rohit Gheyi, Rian Melo, Jonhnanthan Oliveira, Márcio Ribeiro, Wesley K. G. Assunção,
Abstract summary: We investigate the viability of using foundation models for detection, implemented in a tool named RefModel.<n>We evaluate Phi4-14B, and Claude 3.5 Sonnet on a dataset of 858 single-operation transformations applied to artificially generated Java programs.<n>In real-world settings, Claude 3.5 Sonnet and Gemini 2.5 Pro jointly identified 97% of all transformations, surpassing the best-performing static-analysis-based tools.
Score: 2.2670483018110366
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Refactoring is a common software engineering practice that improves code quality without altering program behavior. Although tools like ReExtractor+, RefactoringMiner, and RefDiff have been developed to detect refactorings automatically, they rely on complex rule definitions and static analysis, making them difficult to extend and generalize to other programming languages. In this paper, we investigate the viability of using foundation models for refactoring detection, implemented in a tool named RefModel. We evaluate Phi4-14B, and Claude 3.5 Sonnet on a dataset of 858 single-operation transformations applied to artificially generated Java programs, covering widely-used refactoring types. We also extend our evaluation by including Gemini 2.5 Pro and o4-mini-high, assessing their performance on 44 real-world refactorings extracted from four open-source projects. These models are compared against RefactoringMiner, RefDiff, and ReExtractor+. RefModel is competitive with, and in some cases outperform, traditional tools. In real-world settings, Claude 3.5 Sonnet and Gemini 2.5 Pro jointly identified 97% of all refactorings, surpassing the best-performing static-analysis-based tools. The models showed encouraging generalization to Python and Golang. They provide natural language explanations and require only a single sentence to define each refactoring type.

Related papers

Refactoring $\ eq$ Bug-Inducing: Improving Defect Prediction with Code Change Tactics Analysis [54.361900378970134]
Just-in-time defect prediction (JIT-DP) aims to predict the likelihood of code changes resulting in software defects at an early stage.<n>Prior research has largely ignored code during both the evaluation and methodology phases, despite its prevalence.<n>We propose Code chAnge Tactics (CAT) analysis to categorize code and its propagation, which improves labeling accuracy in the JIT-Defects4J dataset by 13.7%.
arXiv Detail & Related papers (2025-07-25T23:29:25Z)
Assessing the Bug-Proneness of Refactored Code: A Longitudinal Multi-Project Study [43.65862440745159]
Refactoring is a common practice in software development, aimed at improving the internal code structure in order to make it easier to understand and modify.<n>It is often assumed that makes the code less prone to bugs.<n>However, in practice, is a complex task and applied in different ways. Therefore, certains can inadvertently make the code more prone to bugs.
arXiv Detail & Related papers (2025-05-12T19:12:30Z)
Refactoring Detection in C++ Programs with RefactoringMiner++ [45.045206894182776]
We present RefactoringMiner++, a detection tool based on the current state of the art: RefactoringMiner 3.<n>While the latter focuses exclusively on Java, our tool is seeded -- to the best of our knowledge -- the first publicly available detection tool for C++ projects.
arXiv Detail & Related papers (2025-02-24T23:17:35Z)
Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs [54.309127753635366]
We present the results of a replication study in which we investigate GPT-4 effectiveness in recommending and suggesting idiomatic actions.<n>Our findings underscore the potential of LLMs to achieve tasks where, in the past, implementing recommenders based on complex code analyses was required.
arXiv Detail & Related papers (2025-01-28T15:41:54Z)
Testing Refactoring Engine via Historical Bug Report driven LLM [6.852749659993347]
Refactoring is the process of restructuring existing code without changing its external behavior.<n>We propose RETESTER, a framework for automated engine testing.
arXiv Detail & Related papers (2025-01-16T23:31:49Z)
ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z)
RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring [57.8069006460087]
We study automatic rename on variable names, which is considered more challenging than other rename activities. We propose RefBERT, a two-stage pre-trained framework for rename on variable names. We show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.
arXiv Detail & Related papers (2023-05-28T12:29:39Z)
Do code refactorings influence the merge effort? [80.1936417993664]
Multiple contributors frequently change the source code in parallel to implement new features, fix bugs, existing code, and make other changes. These simultaneous changes need to be merged into the same version of the source code. Studies show that 10 to 20 percent of all merge attempts result in conflicts, which require the manual developer's intervention to complete the process.
arXiv Detail & Related papers (2023-05-10T13:24:59Z)
How We Refactor and How We Document it? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation [25.626914797750487]
Refactoring is the art of improving the design of a system without altering its external behavior. This study categorizes commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories. To better understand our classification results, we analyzed commit messages to extract patterns that developers regularly use to describe their smells.
arXiv Detail & Related papers (2020-10-26T20:33:17Z)
Toward the Automatic Classification of Self-Affirmed Refactoring [22.27416971215152]
Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their activities in commit messages. We propose a two-step approach to first identify whether a commit describes developer-related events, then to classify it according to the common quality improvement categories. Our model is able to accurately classify commits, outperforming the pattern-based random approaches, and allowing the discovery of 40 more relevant SAR patterns.
arXiv Detail & Related papers (2020-09-19T18:35:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.