Related papers: ROSE: Transformer-Based Refactoring Recommendation for Architectural Smells

ROSE: Transformer-Based Refactoring Recommendation for Architectural Smells

URL: http://arxiv.org/abs/2507.12561v2
Date: Sun, 20 Jul 2025 19:14:49 GMT
Title: ROSE: Transformer-Based Refactoring Recommendation for Architectural Smells
Authors: Samal Nursapa, Anastassiya Samuilova, Alessio Bucaioni, Phuong T. Nguyen,
Abstract summary: Existing tools detect such smells but rarely suggest how to fix them.<n>We frame the task as a three-class classification problem and fine-tune both models on over 2 million instances mined from 11,149 open-source Java projects.<n>Our results show that transformer-based models can effectively bridge the gap between smell detection and actionable repair.
Score: 3.946103868607285
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Architectural smells such as God Class, Cyclic Dependency, and Hub-like Dependency degrade software quality and maintainability. Existing tools detect such smells but rarely suggest how to fix them. This paper explores the use of pre-trained transformer models--CodeBERT and CodeT5--for recommending suitable refactorings based on detected smells. We frame the task as a three-class classification problem and fine-tune both models on over 2 million refactoring instances mined from 11,149 open-source Java projects. CodeT5 achieves 96.9% accuracy and 95.2% F1, outperforming CodeBERT and traditional baselines. Our results show that transformer-based models can effectively bridge the gap between smell detection and actionable repair, laying the foundation for future refactoring recommendation systems. We release all code, models, and data under an open license to support reproducibility and further research.

Related papers

RefModel: Detecting Refactorings using Foundation Models [2.2670483018110366]
We investigate the viability of using foundation models for detection, implemented in a tool named RefModel.<n>We evaluate Phi4-14B, and Claude 3.5 Sonnet on a dataset of 858 single-operation transformations applied to artificially generated Java programs.<n>In real-world settings, Claude 3.5 Sonnet and Gemini 2.5 Pro jointly identified 97% of all transformations, surpassing the best-performing static-analysis-based tools.
arXiv Detail & Related papers (2025-07-15T14:20:56Z)
Exploring Diffusion Transformer Designs via Grafting [82.91123758506876]
We present grafting, a simple approach for editing pretrained diffusion transformers (DiTs) to materialize new architectures under small compute budgets.<n>We show that new diffusion model designs can be explored by grafting pretrained DiTs, with edits ranging from operator replacement to architecture restructuring.
arXiv Detail & Related papers (2025-06-05T17:59:40Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
Automated Unit Test Refactoring [10.847400457238423]
Test smells arise from poor design practices and insufficient domain knowledge.<n>We propose UTRefactor, a context-enhanced, LLM-based framework for automatic test in Java projects.<n>We evaluate UTRefactor on 879 tests from six open-source Java projects, reducing the number of test smells from 2,375 to 265, achieving an 89% reduction.
arXiv Detail & Related papers (2024-09-25T08:42:29Z)
LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system. We build a novel data-cleaning pipeline that uses these principles to transform existing programs. We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
Predicting the Impact of Batch Refactoring Code Smells on Application Resource Consumption [3.5557219875516646]
This paper determines the relationship between software code smell batch, and resource consumption. Next, it aims to design algorithms to predict the impact of code smell on resource consumption.
arXiv Detail & Related papers (2023-06-27T19:28:05Z)
DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem. The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network. To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z)
AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures. We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS. Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z)
How We Refactor and How We Document it? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation [25.626914797750487]
Refactoring is the art of improving the design of a system without altering its external behavior. This study categorizes commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories. To better understand our classification results, we analyzed commit messages to extract patterns that developers regularly use to describe their smells.
arXiv Detail & Related papers (2020-10-26T20:33:17Z)
Toward the Automatic Classification of Self-Affirmed Refactoring [22.27416971215152]
Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their activities in commit messages. We propose a two-step approach to first identify whether a commit describes developer-related events, then to classify it according to the common quality improvement categories. Our model is able to accurately classify commits, outperforming the pattern-based random approaches, and allowing the discovery of 40 more relevant SAR patterns.
arXiv Detail & Related papers (2020-09-19T18:35:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.