Related papers: Bugs in the Shadows: Static Detection of Faulty Python Refactorings

Bugs in the Shadows: Static Detection of Faulty Python Refactorings

URL: http://arxiv.org/abs/2507.01103v1
Date: Tue, 01 Jul 2025 18:03:56 GMT
Title: Bugs in the Shadows: Static Detection of Faulty Python Refactorings
Authors: Jonhnanthan Oliveira, Rohit Gheyi, Márcio Ribeiro, Alessandro Garcia,
Abstract summary: Python's dynamic type system poses significant challenges for automated code transformations.<n>Our analysis uncovered 29 bugs across four types from a total of 1,152 attempts.<n>These results highlight the need to improve the robustness of current Python tools to ensure the correctness of automated code transformations.
Score: 44.115219601924856
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Python is a widely adopted programming language, valued for its simplicity and flexibility. However, its dynamic type system poses significant challenges for automated refactoring - an essential practice in software evolution aimed at improving internal code structure without changing external behavior. Understanding how type errors are introduced during refactoring is crucial, as such errors can compromise software reliability and reduce developer productivity. In this work, we propose a static analysis technique to detect type errors introduced by refactoring implementations for Python. We evaluated our technique on Rope refactoring implementations, applying them to open-source Python projects. Our analysis uncovered 29 bugs across four refactoring types from a total of 1,152 refactoring attempts. Several of these issues were also found in widely used IDEs such as PyCharm and PyDev. All reported bugs were submitted to the respective developers, and some of them were acknowledged and accepted. These results highlight the need to improve the robustness of current Python refactoring tools to ensure the correctness of automated code transformations and support reliable software maintenance.

Related papers

Assessing the Bug-Proneness of Refactored Code: A Longitudinal Multi-Project Study [43.65862440745159]
Refactoring is a common practice in software development, aimed at improving the internal code structure in order to make it easier to understand and modify.<n>It is often assumed that makes the code less prone to bugs.<n>However, in practice, is a complex task and applied in different ways. Therefore, certains can inadvertently make the code more prone to bugs.
arXiv Detail & Related papers (2025-05-12T19:12:30Z)
ActRef: Enhancing the Understanding of Python Code Refactoring with Action-Based Analysis [10.724563250102696]
This study presents an action-based Refactoring Analysis Framework named ActRef.<n>ActRef mining multiple types (e.g., move, rename, extract, and inline operations) based on diff actions.<n>By focusing on the code change actions, ActRef provides a Python-adaptive solution to detect intricate patterns.
arXiv Detail & Related papers (2025-05-10T07:48:50Z)
Evaluating the Effectiveness of Small Language Models in Detecting Refactoring Bugs [0.6133301815445301]
This study evaluates the effectiveness of Small Language Models (SLMs) in detecting two types of bugs in Java and Python.<n>The study covers 16 types and employs zero-shot prompting on consumer-grade hardware to evaluate the models' ability to reason about correctness without explicit prior training.<n>The proprietary o3-mini-high model achieved the highest detection rate, identifying 84.3% of Type I bugs.
arXiv Detail & Related papers (2025-02-25T18:52:28Z)
Refactoring Detection in C++ Programs with RefactoringMiner++ [45.045206894182776]
We present RefactoringMiner++, a detection tool based on the current state of the art: RefactoringMiner 3.<n>While the latter focuses exclusively on Java, our tool is seeded -- to the best of our knowledge -- the first publicly available detection tool for C++ projects.
arXiv Detail & Related papers (2025-02-24T23:17:35Z)
An Empirical Study of Refactoring Engine Bugs [7.412890903261693]
We present the first systematic study of engine bugs by analyzing bugs in Eclipse, IntelliJ IDEA, and Netbeans. We analyzed these bugs according to their types, symptoms, root causes, and triggering conditions. Our transferability study revealed 130 new bugs in the latest version of those engines.
arXiv Detail & Related papers (2024-09-22T22:09:39Z)
Detecting Refactoring Commits in Machine Learning Python Projects: A Machine Learning-Based Approach [3.000496428347787]
MLRefScanner identifies commits with both ML-specific and general operations. Our study highlights the potential of ML-driven approaches in detecting programming across diverse languages and technical domains.
arXiv Detail & Related papers (2024-04-09T18:46:56Z)
ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z)
A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas. We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data. Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data. BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.