What a diff makes: automating code migration with large language models
- URL: http://arxiv.org/abs/2511.00160v1
- Date: Fri, 31 Oct 2025 18:08:52 GMT
- Title: What a diff makes: automating code migration with large language models
- Authors: Katherine A. Rosenfeld, Cliff C. Kerr, Jessica Lundin,
- Abstract summary: We show that contexts containing diffs can significantly improve performance against out of the box LLMs.<n>We provide a dataset to assist in further development of this problem area, as well as an open-source Python package, AIMigrate, that can be used to assist with migrating code bases.<n>In a real-world migration of TYPHOIDSIM between STARSIM versions, AIMigrate correctly identified 65% of required changes in a single run, increasing to 80% with multiple runs, with 47% of changes generated perfectly.
- Score: 0.15293427903448018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern software programs are built on stacks that are often undergoing changes that introduce updates and improvements, but may also break any project that depends upon them. In this paper we explore the use of Large Language Models (LLMs) for code migration, specifically the problem of maintaining compatibility with a dependency as it undergoes major and minor semantic version changes. We demonstrate, using metrics such as test coverage and change comparisons, that contexts containing diffs can significantly improve performance against out of the box LLMs and, in some cases, perform better than using code. We provide a dataset to assist in further development of this problem area, as well as an open-source Python package, AIMigrate, that can be used to assist with migrating code bases. In a real-world migration of TYPHOIDSIM between STARSIM versions, AIMigrate correctly identified 65% of required changes in a single run, increasing to 80% with multiple runs, with 47% of changes generated perfectly.
Related papers
- MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering [54.236614097082395]
We introduce MEnvAgent, a framework for automated Environment construction.<n>MEnvAgent employs a multi-agent Planning-Execution-Verification architecture to autonomously resolve construction failures.<n>MEnvData-SWE is the largest open-source polyglot dataset of realistic verifiable Docker environments to date.
arXiv Detail & Related papers (2026-01-30T11:36:10Z) - Diffploit: Facilitating Cross-Version Exploit Migration for Open Source Library Vulnerabilities [13.559398564795048]
We propose Diffploit, an iterative, diff-driven exploit migration method structured around two key modules.<n>We evaluate Diffploit on a large-scale dataset containing 102 Java CVEs and 689 version-migration tasks across 79 libraries.<n>It successfully migrates 84.2% exploits, outperforming the change-aware test repair tool TARGET by 52.0% and the rule-based tool in IDEA by 61.6%.
arXiv Detail & Related papers (2025-11-17T04:06:01Z) - Automatic Qiskit Code Refactoring Using Large Language Models [39.71511919246829]
We present a novel methodology for Qiskit code using large language models (LLMs)<n>We begin by extracting a taxonomy of migration scenarios from the different sources of official Qiskit documentation.<n>This taxonomy, along with the original Python source code, is provided as input to an LLM, which is then tasked with identifying instances of migration scenarios in the code.
arXiv Detail & Related papers (2025-06-17T14:00:48Z) - CODEMENV: Benchmarking Large Language Models on Code Migration [11.735053997817765]
CODEMENV consists of 922 examples spanning 19 Python and Java packages.<n>It covers three core tasks: identifying functions incompatible with specific versions, detecting changes in function definitions, and adapting code to target environments.<n> Experimental evaluation with seven LLMs on CODEMENV yields an average pass@1 rate of 26.50%, with GPT-4O achieving the highest score at 43.84%.
arXiv Detail & Related papers (2025-06-01T08:29:59Z) - SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z) - Migrating Code At Scale With LLMs At Google [0.0]
We discuss a large-scale, costly and traditionally manual migration project at Google.<n>We propose a novel automated algorithm that uses change location discovery and a Large Language Model (LLM) to aid developers conduct the migration.<n>Our results suggest that our automated, LLM-assisted workflow can serve as a model for similar initiatives.
arXiv Detail & Related papers (2025-04-13T18:52:44Z) - MigGPT: Harnessing Large Language Models for Automated Migration of Out-of-Tree Linux Kernel Patches Across Versions [53.811953357289866]
Large language models (LLMs) have shown remarkable progress across various domains.<n>LLMs struggle with incomplete code context understanding and inaccurate migration point identification.<n>MigGPT is a framework that employs a novel code fingerprint structure to retain code snippet information.
arXiv Detail & Related papers (2025-04-13T08:08:37Z) - SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks.<n>SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues.<n>We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving competitive performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z) - Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results.
We develop a method that avoids introducing external resources, relying instead on perturbations to the input.
Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z) - Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example [10.635856134931702]
Large Language Models (LLMs) are trained on vast code datasets.
We identify best practices for using LLMs to generate code variants meeting criteria of correctness, usefulness, and applicability.
Implementing these in PyCraft, we achieved an F-measure of 96.6% in identifying correct variants, expanding inputs by 58x on average, and automating changes to increase target codes by up to 39x.
arXiv Detail & Related papers (2024-02-11T09:45:00Z) - Automated Code generation for Information Technology Tasks in YAML
through Large Language Models [56.25231445614503]
We present Wisdom, a natural-language to-YAML code generation tool, aimed at improving IT automation productivity.
We develop two novel performance metrics for YAML and to capture the specific characteristics of this domain.
arXiv Detail & Related papers (2023-05-02T21:01:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.