Refactoring Codebases through Library Design
- URL: http://arxiv.org/abs/2506.11058v3
- Date: Sun, 05 Oct 2025 16:31:35 GMT
- Title: Refactoring Codebases through Library Design
- Authors: Ziga Kovacic, Justin T. Chiu, Celine Lee, Wenting Zhao, Kevin Ellis,
- Abstract summary: We investigate code agents' capacity to code in ways that support growth and reusability.<n>We present both a benchmark and a method for generating reusable libraries.<n>We compare Librarian to state-of-the-art library generation methods, and study it on real-world code bases.
- Score: 21.039476331720312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Maintainable and general software allows developers to build robust applications efficiently, yet achieving these qualities often requires refactoring specialized solutions into reusable components. This challenge becomes particularly relevant as code agents become used to solve isolated one-off programming problems. We investigate code agents' capacity to refactor code in ways that support growth and reusability. We first investigate what makes a good refactoring, finding via simulation results and a human study that Minimum Description Length best correlates with preferable refactorings. We then present both a benchmark and a method for refactoring: MiniCode, a benchmark where multiple files must be refactored into a shared library, and Librarian, a sample-and-rerank method for generating reusable libraries. We compare Librarian to state-of-the-art library generation methods, and study it on real-world code bases.
Related papers
- CodeTaste: Can LLMs Generate Human-Level Code Refactorings? [2.447746234944228]
Large language model (LLM) coding agents can generate working code, but their solutions often accumulate complexity, duplication, and architectural debt.<n>Human developers address such issues through: behavior-preserving program that improve structure and maintainability.<n>We present CodeTaste, a benchmark of tasks mined from large-scale multi-file changes in open-source repositories.
arXiv Detail & Related papers (2026-03-04T15:34:18Z) - SWE-Refactor: A Repository-Level Benchmark for Real-World LLM-Based Code Refactoring [20.694251041823097]
Large Language Models (LLMs) have attracted wide interest for tackling software engineering tasks.<n>Existing benchmarks commonly suffer from three shortcomings.<n>SWE-Refactor comprises 1,099 developer-written, behavior-preserving LLMs mined from 18 Java projects.
arXiv Detail & Related papers (2026-02-03T16:36:29Z) - AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion [55.21541958868449]
We propose AlignCoder, a repository-level code completion framework.<n>Our framework generates an enhanced query that bridges the semantic gap between the initial query and the target code.<n>We employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval.
arXiv Detail & Related papers (2026-01-27T15:23:14Z) - Relating Complexity, Explicitness, Effectiveness of Refactorings and Non-Functional Requirements: A Replication Study [39.82126443893643]
Self-affirmed (SAR) is where developers explicitly state their intent to simplify requirements.<n>This study expanded the scope of Soares et al.'s study by doubling the number of projects and a significantly larger set of validated instances.<n>We observed that when developers explicitly state their intent, the resulting changes typically involve a combination of different types, making them more complex.
arXiv Detail & Related papers (2025-05-12T19:26:33Z) - Assessing the Bug-Proneness of Refactored Code: A Longitudinal Multi-Project Study [43.65862440745159]
Refactoring is a common practice in software development, aimed at improving the internal code structure in order to make it easier to understand and modify.<n>It is often assumed that makes the code less prone to bugs.<n>However, in practice, is a complex task and applied in different ways. Therefore, certains can inadvertently make the code more prone to bugs.
arXiv Detail & Related papers (2025-05-12T19:12:30Z) - Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning [57.09163579304332]
We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories.<n>PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files.<n>We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
arXiv Detail & Related papers (2025-04-24T01:57:01Z) - An Empirical Study on the Code Refactoring Capability of Large Language Models [0.5852077003870416]
This study empirically evaluates StarCoder2, an LLM optimized for code generation, in code across 30 open-source Java projects.
We compare StarCoder2's performance against human developers, focusing on (1) code quality improvements, (2) types and effectiveness of smells, and (3) enhancements through one-shot and chain-of-thought prompting.
arXiv Detail & Related papers (2024-11-04T17:46:20Z) - DOCE: Finding the Sweet Spot for Execution-Based Code Generation [69.5305729627198]
We propose a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decoding, and self-ging as the core components.
Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods.
arXiv Detail & Related papers (2024-08-25T07:10:36Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.<n>We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.<n>We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization.
We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains.
For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z) - Fixing Your Own Smells: Adding a Mistake-Based Familiarisation Step When
Teaching Code Refactoring [2.021502591596062]
Students must first complete a programming exercise to ensure they will produce a code smell.
This simple intervention is based on the idea that learning is easier if students are familiar with the code.
We conducted a study with 35 novice undergraduates in which they completed various exercises alternately taught using a traditional and our'mistake-based' approach.
arXiv Detail & Related papers (2024-01-02T03:39:19Z) - A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware [13.27883339389175]
We propose a novel code generation framework, dubbed A3-CodGen, to harness information within the code repository to generate code with fewer potential logical errors.
Results demonstrate that by adopting the A3-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code.
arXiv Detail & Related papers (2023-12-10T05:36:06Z) - Software refactoring and rewriting: from the perspective of code
transformations [0.0]
We can borrow ideas from micropass/nanopass compilers.
By treating the procedure of software as composing code, we can often obtain representations of processes short enough that their correctness can be analysed manually.
arXiv Detail & Related papers (2023-08-12T17:11:54Z) - Empirical Evaluation of a Live Environment for Extract Method
Refactoring [0.0]
We developed a Live Refactoring Environment that visually identifies, recommends, and applies Extract Methods.
Our results were significantly different and better than the ones from the code manually without further help.
arXiv Detail & Related papers (2023-07-20T16:36:02Z) - Do code refactorings influence the merge effort? [80.1936417993664]
Multiple contributors frequently change the source code in parallel to implement new features, fix bugs, existing code, and make other changes.
These simultaneous changes need to be merged into the same version of the source code.
Studies show that 10 to 20 percent of all merge attempts result in conflicts, which require the manual developer's intervention to complete the process.
arXiv Detail & Related papers (2023-05-10T13:24:59Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z) - Code Librarian: A Software Package Recommendation System [65.05559087332347]
We present a recommendation engine called Librarian for open source libraries.
A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported libraries in the program; 2) it has similar functionality to the imported libraries in the program; 3) it has similar functionality to the developer's implementation, and 4) it can be used efficiently in the context of the provided code.
arXiv Detail & Related papers (2022-10-11T12:30:05Z) - How We Refactor and How We Document it? On the Use of Supervised Machine
Learning Algorithms to Classify Refactoring Documentation [25.626914797750487]
Refactoring is the art of improving the design of a system without altering its external behavior.
This study categorizes commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories.
To better understand our classification results, we analyzed commit messages to extract patterns that developers regularly use to describe their smells.
arXiv Detail & Related papers (2020-10-26T20:33:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.