Related papers: Using LLMs for Library Migration

Using LLMs for Library Migration

URL: http://arxiv.org/abs/2504.13272v1
Date: Thu, 17 Apr 2025 18:32:48 GMT
Title: Using LLMs for Library Migration
Authors: Md Mohayeminul Islam, Ajay Kumar Jha, May Mahmoud, Ildar Akhmetov, Sarah Nadi,
Abstract summary: Large Language Models (LLMs) are good at generating and transforming code and finding similar code.<n>We evaluate three LLMs, LLama 3.1, GPT-4o mini, and GPT-4o on PyMigBench, where we migrate 321 real-world library migrations.<n>LLama 3.1, GPT-4o mini, and GPT-4o correctly migrate 89%, 89%, and 94% of the migration-related code changes respectively.
Score: 1.9247157750972368
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Library migration is the process of replacing a used software library with another library that provides similar functionality. Manual library migration is time-consuming and error prone, as it requires developers to understand the APIs of both libraries, map them, and perform the necessary code transformations. Due to its difficulty, most of the existing automated techniques and tooling stop at the API mapping stage or support a limited set of code transformations. On the other hand, Large Language Models (LLMs) are good at generating and transforming code and finding similar code, which are necessary upstream tasks for library migration. Such capabilities suggest that LLMs may be suitable for library migration. Therefore, in this paper, we investigate the effectiveness of LLMs for migration between Python libraries. We evaluate three LLMs, LLama 3.1, GPT-4o mini, and GPT-4o on PyMigBench, where we migrate 321 real-world library migrations that include 2,989 migration-related code changes. We measure the correctness of the migration results in two ways. We first compare the LLM's migrated code with the developers' migrated code in the benchmark and then run the unit tests available in the client repositories. We find that LLama 3.1, GPT-4o mini, and GPT-4o correctly migrate 89%, 89%, and 94% of the migration-related code changes. respectively. We also find that 36%, 52% and 64% of the LLama 3.1, GPT-4o mini, and GPT-4o migrations pass the same tests that passed in the developer's migration. Overall, our results suggest that LLMs can be effective in migrating code between libraries, but we also identify cases that pose difficulties for the LLM.

Related papers

Automatic Qiskit Code Refactoring Using Large Language Models [39.71511919246829]
We present a novel methodology for Qiskit code using large language models (LLMs)<n>We begin by extracting a taxonomy of migration scenarios from the different sources of official Qiskit documentation.<n>This taxonomy, along with the original Python source code, is provided as input to an LLM, which is then tasked with identifying instances of migration scenarios in the code.
arXiv Detail & Related papers (2025-06-17T14:00:48Z)
CODEMENV: Benchmarking Large Language Models on Code Migration [11.735053997817765]
CODEMENV consists of 922 examples spanning 19 Python and Java packages.<n>It covers three core tasks: identifying functions incompatible with specific versions, detecting changes in function definitions, and adapting code to target environments.<n> Experimental evaluation with seven LLMs on CODEMENV yields an average pass@1 rate of 26.50%, with GPT-4O achieving the highest score at 43.84%.
arXiv Detail & Related papers (2025-06-01T08:29:59Z)
An Empirical Study of Many-to-Many Summarization with Large Language Models [82.10000188179168]
Large language models (LLMs) have shown strong multi-lingual abilities, giving them the potential to perform Many-to-many summarization (M2MS) in real applications.<n>This work presents a systematic empirical study on LLMs' M2MS ability.
arXiv Detail & Related papers (2025-05-19T11:18:54Z)
MigrationBench: Repository-Level Code Migration Benchmark from Java 8 [18.648973521771396]
MigrationBench is a comprehensive benchmark for migration from Java $8$ to the latest long-term support (LTS) versions (Java $17$, $21$)<n>We provide a comprehensive evaluation framework to facilitate rigorous and standardized assessment of large language models (LLMs) on this challenging task.<n>For the selected subset with Claude-3.5-Sonnet-v2, SD-Feedback achieves $62.33%$ and $27.33%$ success rate (pass@1) for minimal and maximal migration respectively.
arXiv Detail & Related papers (2025-05-14T17:11:23Z)
Migrating Code At Scale With LLMs At Google [0.0]
We discuss a large-scale, costly and traditionally manual migration project at Google.<n>We propose a novel automated algorithm that uses change location discovery and a Large Language Model (LLM) to aid developers conduct the migration.<n>Our results suggest that our automated, LLM-assisted workflow can serve as a model for similar initiatives.
arXiv Detail & Related papers (2025-04-13T18:52:44Z)
How is Google using AI for internal code migrations? [5.277315246731]
This article is an experience report on using LLMs for code migrations at Google.<n>Rather, we share our experiences in applying LLM-based code migration in an enterprise context.<n>We see evidence that the use of LLMs can reduce the time needed for migrations significantly.
arXiv Detail & Related papers (2025-01-12T23:06:25Z)
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks.<n>SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues.<n>We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving state-of-the-art performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z)
Scalable, Validated Code Translation of Entire Projects using Large Language Models [13.059046327936393]
Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code. Existing works have shown a drop in translation success rates for code exceeding around 100 lines. We develop a modular approach to translation, where we partition the code into small code fragments which can be independently translated. We show that we can consistently generate reliable Rust for projects up to 6,600 lines of code and 369 functions, with an average of 73% of functions successfully validated for I/O equivalence.
arXiv Detail & Related papers (2024-12-11T02:31:46Z)
Example-Based Automatic Migration of Continuous Integration Systems [2.2836654317217326]
Continuous Integration (CI) is a widely adopted practice for faster code change integration and testing. Developers often migrate between CI systems in pursuit of features like matrix building or better logging. This migration is effort intensive and error-prone owing to limited knowledge of the new CI system and its syntax. We propose a novel approach for CI system's automatic migration: CIMig.
arXiv Detail & Related papers (2024-07-02T20:19:21Z)
Towards Modular LLMs by Building and Reusing a Library of LoRAs [64.43376695346538]
We study how to best build a library of adapters given multi-task data. We introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters. To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters.
arXiv Detail & Related papers (2024-05-18T03:02:23Z)
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution [47.850418420195304]
Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving GitHub issues. We propose a novel Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four agents customized for software evolution.
arXiv Detail & Related papers (2024-03-26T17:57:57Z)
Optimizing LLM Queries in Relational Data Analytics Workloads [50.95919232839785]
Batch data analytics is a growing application for Large Language Models (LLMs) LLMs enable users to perform a wide range of natural language tasks, such as classification, entity extraction, and translation, over large datasets. We propose novel techniques that can significantly reduce the cost of LLM calls for relational data analytics workloads.
arXiv Detail & Related papers (2024-03-09T07:01:44Z)
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks. To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)
LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
Characterizing Python Library Migrations [2.2557806157585834]
We label 3,096 migration-related code changes in 335 Python library migrations. We find that 40% of library pairs have API mappings that involve non-function program elements. On average, a developer needs to learn about 4 APIs and 2 API mappings to perform a migration.
arXiv Detail & Related papers (2022-07-03T21:00:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.