PyMigTool: a tool for end-to-end Python library migration
- URL: http://arxiv.org/abs/2510.08810v1
- Date: Thu, 09 Oct 2025 20:54:26 GMT
- Title: PyMigTool: a tool for end-to-end Python library migration
- Authors: Mohayeminul Islam, Ajay Kumar Jha, May Mahmoud, Sarah Nadi,
- Abstract summary: We develop an end-to-end solution that can automatically migrate code between any arbitrary pair of Python libraries.<n>We first study the capabilities of Large Language Models (LLMs) for library migration on a benchmark of 321 real-world library migrations.<n>We find that LLMs can effectively perform library migration, but some post-processing steps can further improve the performance.
- Score: 0.8586348698580818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Library migration is the process of replacing a library with a similar one in a software project. Manual library migration is time consuming and error prone, as it requires developers to understand the Application Programming Interfaces (API) of both libraries, map equivalent APIs, and perform the necessary code transformations. Due to the difficulty of the library migration process, most of the existing automated techniques and tooling stop at the API mapping stage or support a limited set of libraries and code transformations. In this paper, we develop an end-to-end solution that can automatically migrate code between any arbitrary pair of Python libraries that provide similar functionality. Due to the promising capabilities of Large Language Models (LLMs) in code generation and transformation, we use LLMs as the primary engine for migration. Before building the tool, we first study the capabilities of LLMs for library migration on a benchmark of 321 real-world library migrations. We find that LLMs can effectively perform library migration, but some post-processing steps can further improve the performance. Based on this, we develop PyMigTool, a command line application that combines the power of LLMs, static analysis, and dynamic analysis to provide accurate library migration. We evaluate PyMigTool on 717 real-world Python applications that are not from our benchmark. We find that PyMigTool can migrate 32% of the migrations with complete correctness. Of the remaining migrations, only 14% of the migration-related changes are left for developers to fix for more than half of the projects.
Related papers
- MigMate: A VS Code Extension for LLM-based Library Migration of Python Projects [0.8586348698580818]
Our previous research developed MigrateLib, a command-line LLM-based migration tool.<n>MigMate builds on MigrateLib by integrating the automated migration process into the developer's existing development environment.<n>A preliminary user study shows that plugin usage consistently reduces the time taken to complete a library migration task.
arXiv Detail & Related papers (2026-03-02T08:26:31Z) - Gecko: A Simulation Environment with Stateful Feedback for Refining Agent Tool Calls [56.407063247662336]
We introduce Gecko, a comprehensive environment that simulates tool responses using a combination of rules and LLMs.<n>GATS consistently improves the tool calling performance of various LLMs including GPT-4o, GPT-5, and Gemini-3.0-pro.
arXiv Detail & Related papers (2026-02-22T15:02:00Z) - SPELL: Synthesis of Programmatic Edits using LLMs [10.41623927140964]
Library migration is a common but error-prone task in software development.<n>We present a new approach to automated API migration that sidesteps the limitations described above.
arXiv Detail & Related papers (2026-02-01T09:03:56Z) - Analyzing C/C++ Library Migrations at the Package-level: Prevalence, Domains, Targets and Rationals across Seven Package Management Tools [11.76396912076385]
This paper analyzes 19,943 C/C++ projects that utilize different package management tools and establishes the first C/C++ library migration dataset.<n>We find that the overall trend in the number of C/C++ library migrations is similar to Java.<n>We find four C/C++-specific migration reasons, such as less compile time and unification of dependency management.
arXiv Detail & Related papers (2025-07-04T02:44:38Z) - Automatic Qiskit Code Refactoring Using Large Language Models [39.71511919246829]
We present a novel methodology for Qiskit code using large language models (LLMs)<n>We begin by extracting a taxonomy of migration scenarios from the different sources of official Qiskit documentation.<n>This taxonomy, along with the original Python source code, is provided as input to an LLM, which is then tasked with identifying instances of migration scenarios in the code.
arXiv Detail & Related papers (2025-06-17T14:00:48Z) - Using LLMs for Library Migration [1.9247157750972368]
Large Language Models (LLMs) are good at generating and transforming code and finding similar code.<n>We evaluate three LLMs, LLama 3.1, GPT-4o mini, and GPT-4o on PyMigBench, where we migrate 321 real-world library migrations.<n>LLama 3.1, GPT-4o mini, and GPT-4o correctly migrate 89%, 89%, and 94% of the migration-related code changes respectively.
arXiv Detail & Related papers (2025-04-17T18:32:48Z) - Migrating Code At Scale With LLMs At Google [0.0]
We discuss a large-scale, costly and traditionally manual migration project at Google.<n>We propose a novel automated algorithm that uses change location discovery and a Large Language Model (LLM) to aid developers conduct the migration.<n>Our results suggest that our automated, LLM-assisted workflow can serve as a model for similar initiatives.
arXiv Detail & Related papers (2025-04-13T18:52:44Z) - MigGPT: Harnessing Large Language Models for Automated Migration of Out-of-Tree Linux Kernel Patches Across Versions [53.811953357289866]
Large language models (LLMs) have shown remarkable progress across various domains.<n>LLMs struggle with incomplete code context understanding and inaccurate migration point identification.<n>MigGPT is a framework that employs a novel code fingerprint structure to retain code snippet information.
arXiv Detail & Related papers (2025-04-13T08:08:37Z) - Towards Modular LLMs by Building and Reusing a Library of LoRAs [64.43376695346538]
We study how to best build a library of adapters given multi-task data.
We introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters.
To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters.
arXiv Detail & Related papers (2024-05-18T03:02:23Z) - Executable Code Actions Elicit Better LLM Agents [76.95566120678787]
This work proposes to use Python code to consolidate Large Language Model (LLM) agents' actions into a unified action space (CodeAct)
integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions.
The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language.
arXiv Detail & Related papers (2024-02-01T21:38:58Z) - EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [56.02100384015907]
EasyTool is a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction.
It can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios.
arXiv Detail & Related papers (2024-01-11T15:45:11Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Characterizing Python Library Migrations [2.2557806157585834]
We label 3,096 migration-related code changes in 335 Python library migrations.
We find that 40% of library pairs have API mappings that involve non-function program elements.
On average, a developer needs to learn about 4 APIs and 2 API mappings to perform a migration.
arXiv Detail & Related papers (2022-07-03T21:00:08Z) - BOML: A Modularized Bilevel Optimization Library in Python for Meta
Learning [52.90643948602659]
BOML is a modularized optimization library that unifies several meta-learning algorithms into a common bilevel optimization framework.
It provides a hierarchical optimization pipeline together with a variety of iteration modules, which can be used to solve the mainstream categories of meta-learning methods.
arXiv Detail & Related papers (2020-09-28T14:21:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.